Skip to content

[AMDGPU] fold memref.subview/expand_shape/collapse_shape into amdgpu.gather_to_lds #149851

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jul 23, 2025

Conversation

lialan
Copy link
Member

@lialan lialan commented Jul 21, 2025

This PR adds a new optimization pass to fold memref.subview/expand_shape/collapse_shape ops into consumer amdgpu.gather_to_lds operations.

  • Implements a new pass AmdgpuFoldMemRefOpsPass with pattern FoldMemRefOpsIntoGatherToLDSOp
  • Adds corresponding folding tests

@lialan lialan requested a review from Copilot July 21, 2025 16:59
Copilot

This comment was marked as outdated.

@lialan lialan requested a review from Copilot July 21, 2025 17:07
Copilot

This comment was marked as outdated.

@lialan lialan requested a review from Copilot July 21, 2025 18:21
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements a new optimization pass to fold memref.subview operations into amdgpu.gather_to_lds operations, which can simplify IR and improve performance by eliminating intermediate subview operations.

  • Adds AmdgpuFoldSubviewOpsPass with pattern FoldSubviewIntoGatherToLDSOp that identifies and folds subview sources
  • Implements index resolution using affine maps to adjust indices when folding subviews with offsets
  • Adds comprehensive test coverage for both zero-offset and non-zero offset subview folding scenarios

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
mlir/lib/Dialect/AMDGPU/Transforms/FoldSubviewOps.cpp Core implementation of the folding pass and pattern
mlir/include/mlir/Dialect/AMDGPU/Transforms/Passes.td Pass definition and documentation
mlir/include/mlir/Dialect/AMDGPU/Transforms/Passes.h Pass declarations and pattern population function
mlir/lib/Dialect/AMDGPU/Transforms/CMakeLists.txt Build system integration for new source file
mlir/test/Dialect/AMDGPU/amdgpu-fold-subviews.mlir Test cases validating the folding optimization

Copy link

github-actions bot commented Jul 21, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@lialan lialan marked this pull request as ready for review July 21, 2025 18:49
@lialan lialan requested review from krzysz00 and qedawkins July 21, 2025 18:50
@llvmbot
Copy link
Member

llvmbot commented Jul 21, 2025

@llvm/pr-subscribers-mlir-memref

@llvm/pr-subscribers-mlir-gpu

Author: Alan Li (lialan)

Changes

This PR adds a new optimization pass to fold memref.subview operations into amdgpu.gather_to_lds operations, simplifying the overall operation and potentially improving performance. The pass identifies when a GatherToLDSOp has a memref.subview as its source and attempts to fold the subview by adjusting the indices accordingly.

  • Implements a new pass AmdgpuFoldSubviewOpsPass with pattern FoldSubviewIntoGatherToLDSOp
  • Adds corresponding folding test

Full diff: https://github.com/llvm/llvm-project/pull/149851.diff

5 Files Affected:

  • (modified) mlir/include/mlir/Dialect/AMDGPU/Transforms/Passes.h (+5-1)
  • (modified) mlir/include/mlir/Dialect/AMDGPU/Transforms/Passes.td (+12)
  • (modified) mlir/lib/Dialect/AMDGPU/Transforms/CMakeLists.txt (+2-1)
  • (added) mlir/lib/Dialect/AMDGPU/Transforms/FoldSubviewOps.cpp (+67)
  • (added) mlir/test/Dialect/AMDGPU/amdgpu-fold-subviews.mlir (+50)
diff --git a/mlir/include/mlir/Dialect/AMDGPU/Transforms/Passes.h b/mlir/include/mlir/Dialect/AMDGPU/Transforms/Passes.h
index cc2f543e79f69..a61903609aaff 100644
--- a/mlir/include/mlir/Dialect/AMDGPU/Transforms/Passes.h
+++ b/mlir/include/mlir/Dialect/AMDGPU/Transforms/Passes.h
@@ -22,8 +22,9 @@ class ConversionTarget;
 namespace amdgpu {
 
 #define GEN_PASS_DECL_AMDGPUEMULATEATOMICSPASS
-#define GEN_PASS_DECL_AMDGPURESOLVESTRIDEDMETADATAPASS
+#define GEN_PASS_DECL_AMDGPUFOLDSUBVIEWOPSPASS
 #define GEN_PASS_DECL_AMDGPUMASKEDLOADTOLOADPASS
+#define GEN_PASS_DECL_AMDGPURESOLVESTRIDEDMETADATAPASS
 #define GEN_PASS_REGISTRATION
 #include "mlir/Dialect/AMDGPU/Transforms/Passes.h.inc"
 
@@ -38,6 +39,9 @@ void populateAmdgpuResolveStridedMetadataPatterns(RewritePatternSet &patterns,
 void populateAmdgpuMaskedloadToLoadPatterns(RewritePatternSet &patterns,
                                             PatternBenefit benefit = 1);
 
+void populateAmdgpuFoldSubviewOpsPatterns(RewritePatternSet &patterns,
+                                          PatternBenefit benefit = 1);
+
 } // namespace amdgpu
 } // namespace mlir
 
diff --git a/mlir/include/mlir/Dialect/AMDGPU/Transforms/Passes.td b/mlir/include/mlir/Dialect/AMDGPU/Transforms/Passes.td
index 8d0e6829ab0cc..fad939ced9877 100644
--- a/mlir/include/mlir/Dialect/AMDGPU/Transforms/Passes.td
+++ b/mlir/include/mlir/Dialect/AMDGPU/Transforms/Passes.td
@@ -70,4 +70,16 @@ def AmdgpuMaskedloadToLoadPass : Pass<"amdgpu-maskedload-to-load"> {
     "memref::MemRefDialect"
   ];
 }
+
+def AmdgpuFoldSubviewOpsPass : Pass<"amdgpu-fold-subview-ops"> {
+  let summary = "Fold subview operations into their parent operations";
+  let description = [{
+    This pass identifies `memref.subview` sources of `GatherToLDSOp` and
+    attempts to fold the source ops, potentially simplifying the overall
+    operation and improving performance.
+  }];
+  let dependentDialects = [
+    "memref::MemRefDialect"
+  ];
+}
 #endif // MLIR_DIALECT_AMDGPU_TRANSFORMS_PASSES_TD_
diff --git a/mlir/lib/Dialect/AMDGPU/Transforms/CMakeLists.txt b/mlir/lib/Dialect/AMDGPU/Transforms/CMakeLists.txt
index 17bbe54ea6c0c..20621ec0d55a4 100644
--- a/mlir/lib/Dialect/AMDGPU/Transforms/CMakeLists.txt
+++ b/mlir/lib/Dialect/AMDGPU/Transforms/CMakeLists.txt
@@ -1,7 +1,8 @@
 add_mlir_dialect_library(MLIRAMDGPUTransforms
   EmulateAtomics.cpp
-  ResolveStridedMetadata.cpp
+  FoldSubviewOps.cpp
   MaskedloadToLoad.cpp
+  ResolveStridedMetadata.cpp
 
   ADDITIONAL_HEADER_DIRS
   {$MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/AMDGPU/Transforms
diff --git a/mlir/lib/Dialect/AMDGPU/Transforms/FoldSubviewOps.cpp b/mlir/lib/Dialect/AMDGPU/Transforms/FoldSubviewOps.cpp
new file mode 100644
index 0000000000000..adbdf4b856bd5
--- /dev/null
+++ b/mlir/lib/Dialect/AMDGPU/Transforms/FoldSubviewOps.cpp
@@ -0,0 +1,67 @@
+//===- FoldSubviewOps.cpp - AMDGPU fold subview ops ---------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "mlir/Dialect/AMDGPU/Transforms/Passes.h"
+
+#include "mlir/Dialect/AMDGPU/IR/AMDGPUDialect.h"
+#include "mlir/Dialect/Affine/ViewLikeInterfaceUtils.h"
+#include "mlir/Dialect/MemRef/IR/MemRef.h"
+#include "mlir/Transforms/GreedyPatternRewriteDriver.h"
+
+namespace mlir::amdgpu {
+#define GEN_PASS_DEF_AMDGPUFOLDSUBVIEWOPSPASS
+#include "mlir/Dialect/AMDGPU/Transforms/Passes.h.inc"
+} // namespace mlir::amdgpu
+
+using namespace mlir;
+using namespace mlir::amdgpu;
+
+namespace {
+struct AmdgpuFoldSubviewOpsPass
+    : public amdgpu::impl::AmdgpuFoldSubviewOpsPassBase<
+          AmdgpuFoldSubviewOpsPass> {
+  void runOnOperation() override {
+    RewritePatternSet patterns(&getContext());
+    populateAmdgpuFoldSubviewOpsPatterns(patterns);
+    if (failed(applyPatternsGreedily(getOperation(), std::move(patterns))))
+      signalPassFailure();
+  }
+};
+
+struct FoldSubviewIntoGatherToLDSOp : public OpRewritePattern<GatherToLDSOp> {
+  using OpRewritePattern<GatherToLDSOp>::OpRewritePattern;
+  LogicalResult matchAndRewrite(GatherToLDSOp op,
+                                PatternRewriter &rewriter) const override {
+    Location loc = op.getLoc();
+
+    // Check if the source is a subview operation:
+    auto subviewOp = dyn_cast<memref::SubViewOp>(op.getSrc().getDefiningOp());
+    if (!subviewOp)
+      return rewriter.notifyMatchFailure(
+          loc, "GatherToLDSOp folding is currently supported only when the "
+               "source is a SubviewOp. This is one specific pattern, and other "
+               "scenarios may be added in the future.");
+
+    SmallVector<Value> sourceIndices;
+    mlir::affine::resolveIndicesIntoOpWithOffsetsAndStrides(
+        rewriter, loc, subviewOp.getMixedOffsets(), subviewOp.getMixedStrides(),
+        subviewOp.getDroppedDims(), op.getSrcIndices(), sourceIndices);
+
+    rewriter.replaceOpWithNewOp<GatherToLDSOp>(
+        op, subviewOp.getSource(), sourceIndices, op.getDst(),
+        op.getDstIndices(), op.getTransferType());
+
+    return success();
+  }
+};
+} // namespace
+
+void mlir::amdgpu::populateAmdgpuFoldSubviewOpsPatterns(
+    RewritePatternSet &patterns, PatternBenefit benefit) {
+  patterns.add<FoldSubviewIntoGatherToLDSOp>(patterns.getContext(), benefit);
+}
diff --git a/mlir/test/Dialect/AMDGPU/amdgpu-fold-subviews.mlir b/mlir/test/Dialect/AMDGPU/amdgpu-fold-subviews.mlir
new file mode 100644
index 0000000000000..d582991c3622f
--- /dev/null
+++ b/mlir/test/Dialect/AMDGPU/amdgpu-fold-subviews.mlir
@@ -0,0 +1,50 @@
+// RUN: mlir-opt -amdgpu-fold-subview-ops -split-input-file %s | FileCheck %s
+
+#gpu_lds_addrspace = 3
+
+// CHECK: func @test_memref
+// CHECK-SAME: %[[ARG0:.*]]: index, %[[ARG1:.*]]: index
+func.func @test_memref(%offset_i: index, %offset_j: index) {
+  // CHECK: %[[C0:.*]] = arith.constant 0 : index
+  // CHECK: %[[LOCAL:.*]] = memref.alloc() : memref<64x64xf16, 3>
+  // CHECK: %[[MEM:.*]] = memref.alloc() : memref<64x128xf16>
+  // CHECK:  %[[MEM]][%arg0, %arg1], %[[LOCAL]][%[[C0]], %[[C0]]]
+  // CHECK-SAME: vector<8xf16>, memref<64x128xf16>, memref<64x64xf16, 3>
+
+  %alloc = memref.alloc() : memref<64x64xf16, #gpu_lds_addrspace>
+  %mem = memref.alloc() : memref<64x128xf16>
+  %subview = memref.subview %mem[0, 0][32, 64][1, 1] : memref<64x128xf16> to memref<32x64xf16, strided<[128, 1]>>
+  %c0 = arith.constant 0 : index
+  amdgpu.gather_to_lds %subview[%offset_i, %offset_j], %alloc[%c0, %c0]
+    : vector<8xf16>, memref<32x64xf16, strided<[128, 1]>>, memref<64x64xf16, #gpu_lds_addrspace>
+  func.return
+}
+
+// -----
+
+#gpu_lds_addrspace = 3
+
+// CHECK: #[[MAP:.*]] = affine_map<()[s0] -> (s0 + 32)>
+// CHECK: #[[MAP1:.*]] = affine_map<()[s0] -> (s0 + 64)>
+
+// CHECK: func @subview_folding_offset
+// CHECK-SAME: %[[ARG0:.*]]: index, %[[ARG1:.*]]: index
+func.func @subview_folding_offset(%offset_i: index, %offset_j: index) {
+  // CHECK: %[[C0:.*]] = arith.constant 0 : index
+  // CHECK: %[[LOCAL:.*]] = memref.alloc() : memref<64x64xf16, 3>
+  // CHECK: %[[MEM:.*]] = memref.alloc() : memref<64x128xf16>
+
+  // CHECK: %[[IDX0:.*]] = affine.apply #[[MAP]]()[%[[ARG0]]]
+  // CHECK: %[[IDX1:.*]] = affine.apply #[[MAP1]]()[%[[ARG1]]]
+
+  // CHECK:  %[[MEM]][%[[IDX0]], %[[IDX1]]], %[[LOCAL]][%[[C0]], %[[C0]]]
+  // CHECK-SAME: vector<8xf16>, memref<64x128xf16>, memref<64x64xf16, 3>
+
+  %alloc = memref.alloc() : memref<64x64xf16, #gpu_lds_addrspace>
+  %mem = memref.alloc() : memref<64x128xf16>
+  %subview = memref.subview %mem[32, 64][32, 64][1, 1] : memref<64x128xf16> to memref<32x64xf16, strided<[128, 1], offset: 4160>>
+  %c0 = arith.constant 0 : index
+  amdgpu.gather_to_lds %subview[%offset_i, %offset_j], %alloc[%c0, %c0]
+    : vector<8xf16>, memref<32x64xf16, strided<[128, 1], offset: 4160>>, memref<64x64xf16, #gpu_lds_addrspace>
+  func.return
+}

@llvmbot
Copy link
Member

llvmbot commented Jul 21, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Alan Li (lialan)

Changes

This PR adds a new optimization pass to fold memref.subview operations into amdgpu.gather_to_lds operations, simplifying the overall operation and potentially improving performance. The pass identifies when a GatherToLDSOp has a memref.subview as its source and attempts to fold the subview by adjusting the indices accordingly.

  • Implements a new pass AmdgpuFoldSubviewOpsPass with pattern FoldSubviewIntoGatherToLDSOp
  • Adds corresponding folding test

Full diff: https://github.com/llvm/llvm-project/pull/149851.diff

5 Files Affected:

  • (modified) mlir/include/mlir/Dialect/AMDGPU/Transforms/Passes.h (+5-1)
  • (modified) mlir/include/mlir/Dialect/AMDGPU/Transforms/Passes.td (+12)
  • (modified) mlir/lib/Dialect/AMDGPU/Transforms/CMakeLists.txt (+2-1)
  • (added) mlir/lib/Dialect/AMDGPU/Transforms/FoldSubviewOps.cpp (+67)
  • (added) mlir/test/Dialect/AMDGPU/amdgpu-fold-subviews.mlir (+50)
diff --git a/mlir/include/mlir/Dialect/AMDGPU/Transforms/Passes.h b/mlir/include/mlir/Dialect/AMDGPU/Transforms/Passes.h
index cc2f543e79f69..a61903609aaff 100644
--- a/mlir/include/mlir/Dialect/AMDGPU/Transforms/Passes.h
+++ b/mlir/include/mlir/Dialect/AMDGPU/Transforms/Passes.h
@@ -22,8 +22,9 @@ class ConversionTarget;
 namespace amdgpu {
 
 #define GEN_PASS_DECL_AMDGPUEMULATEATOMICSPASS
-#define GEN_PASS_DECL_AMDGPURESOLVESTRIDEDMETADATAPASS
+#define GEN_PASS_DECL_AMDGPUFOLDSUBVIEWOPSPASS
 #define GEN_PASS_DECL_AMDGPUMASKEDLOADTOLOADPASS
+#define GEN_PASS_DECL_AMDGPURESOLVESTRIDEDMETADATAPASS
 #define GEN_PASS_REGISTRATION
 #include "mlir/Dialect/AMDGPU/Transforms/Passes.h.inc"
 
@@ -38,6 +39,9 @@ void populateAmdgpuResolveStridedMetadataPatterns(RewritePatternSet &patterns,
 void populateAmdgpuMaskedloadToLoadPatterns(RewritePatternSet &patterns,
                                             PatternBenefit benefit = 1);
 
+void populateAmdgpuFoldSubviewOpsPatterns(RewritePatternSet &patterns,
+                                          PatternBenefit benefit = 1);
+
 } // namespace amdgpu
 } // namespace mlir
 
diff --git a/mlir/include/mlir/Dialect/AMDGPU/Transforms/Passes.td b/mlir/include/mlir/Dialect/AMDGPU/Transforms/Passes.td
index 8d0e6829ab0cc..fad939ced9877 100644
--- a/mlir/include/mlir/Dialect/AMDGPU/Transforms/Passes.td
+++ b/mlir/include/mlir/Dialect/AMDGPU/Transforms/Passes.td
@@ -70,4 +70,16 @@ def AmdgpuMaskedloadToLoadPass : Pass<"amdgpu-maskedload-to-load"> {
     "memref::MemRefDialect"
   ];
 }
+
+def AmdgpuFoldSubviewOpsPass : Pass<"amdgpu-fold-subview-ops"> {
+  let summary = "Fold subview operations into their parent operations";
+  let description = [{
+    This pass identifies `memref.subview` sources of `GatherToLDSOp` and
+    attempts to fold the source ops, potentially simplifying the overall
+    operation and improving performance.
+  }];
+  let dependentDialects = [
+    "memref::MemRefDialect"
+  ];
+}
 #endif // MLIR_DIALECT_AMDGPU_TRANSFORMS_PASSES_TD_
diff --git a/mlir/lib/Dialect/AMDGPU/Transforms/CMakeLists.txt b/mlir/lib/Dialect/AMDGPU/Transforms/CMakeLists.txt
index 17bbe54ea6c0c..20621ec0d55a4 100644
--- a/mlir/lib/Dialect/AMDGPU/Transforms/CMakeLists.txt
+++ b/mlir/lib/Dialect/AMDGPU/Transforms/CMakeLists.txt
@@ -1,7 +1,8 @@
 add_mlir_dialect_library(MLIRAMDGPUTransforms
   EmulateAtomics.cpp
-  ResolveStridedMetadata.cpp
+  FoldSubviewOps.cpp
   MaskedloadToLoad.cpp
+  ResolveStridedMetadata.cpp
 
   ADDITIONAL_HEADER_DIRS
   {$MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/AMDGPU/Transforms
diff --git a/mlir/lib/Dialect/AMDGPU/Transforms/FoldSubviewOps.cpp b/mlir/lib/Dialect/AMDGPU/Transforms/FoldSubviewOps.cpp
new file mode 100644
index 0000000000000..adbdf4b856bd5
--- /dev/null
+++ b/mlir/lib/Dialect/AMDGPU/Transforms/FoldSubviewOps.cpp
@@ -0,0 +1,67 @@
+//===- FoldSubviewOps.cpp - AMDGPU fold subview ops ---------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "mlir/Dialect/AMDGPU/Transforms/Passes.h"
+
+#include "mlir/Dialect/AMDGPU/IR/AMDGPUDialect.h"
+#include "mlir/Dialect/Affine/ViewLikeInterfaceUtils.h"
+#include "mlir/Dialect/MemRef/IR/MemRef.h"
+#include "mlir/Transforms/GreedyPatternRewriteDriver.h"
+
+namespace mlir::amdgpu {
+#define GEN_PASS_DEF_AMDGPUFOLDSUBVIEWOPSPASS
+#include "mlir/Dialect/AMDGPU/Transforms/Passes.h.inc"
+} // namespace mlir::amdgpu
+
+using namespace mlir;
+using namespace mlir::amdgpu;
+
+namespace {
+struct AmdgpuFoldSubviewOpsPass
+    : public amdgpu::impl::AmdgpuFoldSubviewOpsPassBase<
+          AmdgpuFoldSubviewOpsPass> {
+  void runOnOperation() override {
+    RewritePatternSet patterns(&getContext());
+    populateAmdgpuFoldSubviewOpsPatterns(patterns);
+    if (failed(applyPatternsGreedily(getOperation(), std::move(patterns))))
+      signalPassFailure();
+  }
+};
+
+struct FoldSubviewIntoGatherToLDSOp : public OpRewritePattern<GatherToLDSOp> {
+  using OpRewritePattern<GatherToLDSOp>::OpRewritePattern;
+  LogicalResult matchAndRewrite(GatherToLDSOp op,
+                                PatternRewriter &rewriter) const override {
+    Location loc = op.getLoc();
+
+    // Check if the source is a subview operation:
+    auto subviewOp = dyn_cast<memref::SubViewOp>(op.getSrc().getDefiningOp());
+    if (!subviewOp)
+      return rewriter.notifyMatchFailure(
+          loc, "GatherToLDSOp folding is currently supported only when the "
+               "source is a SubviewOp. This is one specific pattern, and other "
+               "scenarios may be added in the future.");
+
+    SmallVector<Value> sourceIndices;
+    mlir::affine::resolveIndicesIntoOpWithOffsetsAndStrides(
+        rewriter, loc, subviewOp.getMixedOffsets(), subviewOp.getMixedStrides(),
+        subviewOp.getDroppedDims(), op.getSrcIndices(), sourceIndices);
+
+    rewriter.replaceOpWithNewOp<GatherToLDSOp>(
+        op, subviewOp.getSource(), sourceIndices, op.getDst(),
+        op.getDstIndices(), op.getTransferType());
+
+    return success();
+  }
+};
+} // namespace
+
+void mlir::amdgpu::populateAmdgpuFoldSubviewOpsPatterns(
+    RewritePatternSet &patterns, PatternBenefit benefit) {
+  patterns.add<FoldSubviewIntoGatherToLDSOp>(patterns.getContext(), benefit);
+}
diff --git a/mlir/test/Dialect/AMDGPU/amdgpu-fold-subviews.mlir b/mlir/test/Dialect/AMDGPU/amdgpu-fold-subviews.mlir
new file mode 100644
index 0000000000000..d582991c3622f
--- /dev/null
+++ b/mlir/test/Dialect/AMDGPU/amdgpu-fold-subviews.mlir
@@ -0,0 +1,50 @@
+// RUN: mlir-opt -amdgpu-fold-subview-ops -split-input-file %s | FileCheck %s
+
+#gpu_lds_addrspace = 3
+
+// CHECK: func @test_memref
+// CHECK-SAME: %[[ARG0:.*]]: index, %[[ARG1:.*]]: index
+func.func @test_memref(%offset_i: index, %offset_j: index) {
+  // CHECK: %[[C0:.*]] = arith.constant 0 : index
+  // CHECK: %[[LOCAL:.*]] = memref.alloc() : memref<64x64xf16, 3>
+  // CHECK: %[[MEM:.*]] = memref.alloc() : memref<64x128xf16>
+  // CHECK:  %[[MEM]][%arg0, %arg1], %[[LOCAL]][%[[C0]], %[[C0]]]
+  // CHECK-SAME: vector<8xf16>, memref<64x128xf16>, memref<64x64xf16, 3>
+
+  %alloc = memref.alloc() : memref<64x64xf16, #gpu_lds_addrspace>
+  %mem = memref.alloc() : memref<64x128xf16>
+  %subview = memref.subview %mem[0, 0][32, 64][1, 1] : memref<64x128xf16> to memref<32x64xf16, strided<[128, 1]>>
+  %c0 = arith.constant 0 : index
+  amdgpu.gather_to_lds %subview[%offset_i, %offset_j], %alloc[%c0, %c0]
+    : vector<8xf16>, memref<32x64xf16, strided<[128, 1]>>, memref<64x64xf16, #gpu_lds_addrspace>
+  func.return
+}
+
+// -----
+
+#gpu_lds_addrspace = 3
+
+// CHECK: #[[MAP:.*]] = affine_map<()[s0] -> (s0 + 32)>
+// CHECK: #[[MAP1:.*]] = affine_map<()[s0] -> (s0 + 64)>
+
+// CHECK: func @subview_folding_offset
+// CHECK-SAME: %[[ARG0:.*]]: index, %[[ARG1:.*]]: index
+func.func @subview_folding_offset(%offset_i: index, %offset_j: index) {
+  // CHECK: %[[C0:.*]] = arith.constant 0 : index
+  // CHECK: %[[LOCAL:.*]] = memref.alloc() : memref<64x64xf16, 3>
+  // CHECK: %[[MEM:.*]] = memref.alloc() : memref<64x128xf16>
+
+  // CHECK: %[[IDX0:.*]] = affine.apply #[[MAP]]()[%[[ARG0]]]
+  // CHECK: %[[IDX1:.*]] = affine.apply #[[MAP1]]()[%[[ARG1]]]
+
+  // CHECK:  %[[MEM]][%[[IDX0]], %[[IDX1]]], %[[LOCAL]][%[[C0]], %[[C0]]]
+  // CHECK-SAME: vector<8xf16>, memref<64x128xf16>, memref<64x64xf16, 3>
+
+  %alloc = memref.alloc() : memref<64x64xf16, #gpu_lds_addrspace>
+  %mem = memref.alloc() : memref<64x128xf16>
+  %subview = memref.subview %mem[32, 64][32, 64][1, 1] : memref<64x128xf16> to memref<32x64xf16, strided<[128, 1], offset: 4160>>
+  %c0 = arith.constant 0 : index
+  amdgpu.gather_to_lds %subview[%offset_i, %offset_j], %alloc[%c0, %c0]
+    : vector<8xf16>, memref<32x64xf16, strided<[128, 1], offset: 4160>>, memref<64x64xf16, #gpu_lds_addrspace>
+  func.return
+}

@llvmbot
Copy link
Member

llvmbot commented Jul 21, 2025

@llvm/pr-subscribers-mlir

Author: Alan Li (lialan)

Changes

This PR adds a new optimization pass to fold memref.subview operations into amdgpu.gather_to_lds operations, simplifying the overall operation and potentially improving performance. The pass identifies when a GatherToLDSOp has a memref.subview as its source and attempts to fold the subview by adjusting the indices accordingly.

  • Implements a new pass AmdgpuFoldSubviewOpsPass with pattern FoldSubviewIntoGatherToLDSOp
  • Adds corresponding folding test

Full diff: https://github.com/llvm/llvm-project/pull/149851.diff

5 Files Affected:

  • (modified) mlir/include/mlir/Dialect/AMDGPU/Transforms/Passes.h (+5-1)
  • (modified) mlir/include/mlir/Dialect/AMDGPU/Transforms/Passes.td (+12)
  • (modified) mlir/lib/Dialect/AMDGPU/Transforms/CMakeLists.txt (+2-1)
  • (added) mlir/lib/Dialect/AMDGPU/Transforms/FoldSubviewOps.cpp (+67)
  • (added) mlir/test/Dialect/AMDGPU/amdgpu-fold-subviews.mlir (+50)
diff --git a/mlir/include/mlir/Dialect/AMDGPU/Transforms/Passes.h b/mlir/include/mlir/Dialect/AMDGPU/Transforms/Passes.h
index cc2f543e79f69..a61903609aaff 100644
--- a/mlir/include/mlir/Dialect/AMDGPU/Transforms/Passes.h
+++ b/mlir/include/mlir/Dialect/AMDGPU/Transforms/Passes.h
@@ -22,8 +22,9 @@ class ConversionTarget;
 namespace amdgpu {
 
 #define GEN_PASS_DECL_AMDGPUEMULATEATOMICSPASS
-#define GEN_PASS_DECL_AMDGPURESOLVESTRIDEDMETADATAPASS
+#define GEN_PASS_DECL_AMDGPUFOLDSUBVIEWOPSPASS
 #define GEN_PASS_DECL_AMDGPUMASKEDLOADTOLOADPASS
+#define GEN_PASS_DECL_AMDGPURESOLVESTRIDEDMETADATAPASS
 #define GEN_PASS_REGISTRATION
 #include "mlir/Dialect/AMDGPU/Transforms/Passes.h.inc"
 
@@ -38,6 +39,9 @@ void populateAmdgpuResolveStridedMetadataPatterns(RewritePatternSet &patterns,
 void populateAmdgpuMaskedloadToLoadPatterns(RewritePatternSet &patterns,
                                             PatternBenefit benefit = 1);
 
+void populateAmdgpuFoldSubviewOpsPatterns(RewritePatternSet &patterns,
+                                          PatternBenefit benefit = 1);
+
 } // namespace amdgpu
 } // namespace mlir
 
diff --git a/mlir/include/mlir/Dialect/AMDGPU/Transforms/Passes.td b/mlir/include/mlir/Dialect/AMDGPU/Transforms/Passes.td
index 8d0e6829ab0cc..fad939ced9877 100644
--- a/mlir/include/mlir/Dialect/AMDGPU/Transforms/Passes.td
+++ b/mlir/include/mlir/Dialect/AMDGPU/Transforms/Passes.td
@@ -70,4 +70,16 @@ def AmdgpuMaskedloadToLoadPass : Pass<"amdgpu-maskedload-to-load"> {
     "memref::MemRefDialect"
   ];
 }
+
+def AmdgpuFoldSubviewOpsPass : Pass<"amdgpu-fold-subview-ops"> {
+  let summary = "Fold subview operations into their parent operations";
+  let description = [{
+    This pass identifies `memref.subview` sources of `GatherToLDSOp` and
+    attempts to fold the source ops, potentially simplifying the overall
+    operation and improving performance.
+  }];
+  let dependentDialects = [
+    "memref::MemRefDialect"
+  ];
+}
 #endif // MLIR_DIALECT_AMDGPU_TRANSFORMS_PASSES_TD_
diff --git a/mlir/lib/Dialect/AMDGPU/Transforms/CMakeLists.txt b/mlir/lib/Dialect/AMDGPU/Transforms/CMakeLists.txt
index 17bbe54ea6c0c..20621ec0d55a4 100644
--- a/mlir/lib/Dialect/AMDGPU/Transforms/CMakeLists.txt
+++ b/mlir/lib/Dialect/AMDGPU/Transforms/CMakeLists.txt
@@ -1,7 +1,8 @@
 add_mlir_dialect_library(MLIRAMDGPUTransforms
   EmulateAtomics.cpp
-  ResolveStridedMetadata.cpp
+  FoldSubviewOps.cpp
   MaskedloadToLoad.cpp
+  ResolveStridedMetadata.cpp
 
   ADDITIONAL_HEADER_DIRS
   {$MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/AMDGPU/Transforms
diff --git a/mlir/lib/Dialect/AMDGPU/Transforms/FoldSubviewOps.cpp b/mlir/lib/Dialect/AMDGPU/Transforms/FoldSubviewOps.cpp
new file mode 100644
index 0000000000000..adbdf4b856bd5
--- /dev/null
+++ b/mlir/lib/Dialect/AMDGPU/Transforms/FoldSubviewOps.cpp
@@ -0,0 +1,67 @@
+//===- FoldSubviewOps.cpp - AMDGPU fold subview ops ---------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "mlir/Dialect/AMDGPU/Transforms/Passes.h"
+
+#include "mlir/Dialect/AMDGPU/IR/AMDGPUDialect.h"
+#include "mlir/Dialect/Affine/ViewLikeInterfaceUtils.h"
+#include "mlir/Dialect/MemRef/IR/MemRef.h"
+#include "mlir/Transforms/GreedyPatternRewriteDriver.h"
+
+namespace mlir::amdgpu {
+#define GEN_PASS_DEF_AMDGPUFOLDSUBVIEWOPSPASS
+#include "mlir/Dialect/AMDGPU/Transforms/Passes.h.inc"
+} // namespace mlir::amdgpu
+
+using namespace mlir;
+using namespace mlir::amdgpu;
+
+namespace {
+struct AmdgpuFoldSubviewOpsPass
+    : public amdgpu::impl::AmdgpuFoldSubviewOpsPassBase<
+          AmdgpuFoldSubviewOpsPass> {
+  void runOnOperation() override {
+    RewritePatternSet patterns(&getContext());
+    populateAmdgpuFoldSubviewOpsPatterns(patterns);
+    if (failed(applyPatternsGreedily(getOperation(), std::move(patterns))))
+      signalPassFailure();
+  }
+};
+
+struct FoldSubviewIntoGatherToLDSOp : public OpRewritePattern<GatherToLDSOp> {
+  using OpRewritePattern<GatherToLDSOp>::OpRewritePattern;
+  LogicalResult matchAndRewrite(GatherToLDSOp op,
+                                PatternRewriter &rewriter) const override {
+    Location loc = op.getLoc();
+
+    // Check if the source is a subview operation:
+    auto subviewOp = dyn_cast<memref::SubViewOp>(op.getSrc().getDefiningOp());
+    if (!subviewOp)
+      return rewriter.notifyMatchFailure(
+          loc, "GatherToLDSOp folding is currently supported only when the "
+               "source is a SubviewOp. This is one specific pattern, and other "
+               "scenarios may be added in the future.");
+
+    SmallVector<Value> sourceIndices;
+    mlir::affine::resolveIndicesIntoOpWithOffsetsAndStrides(
+        rewriter, loc, subviewOp.getMixedOffsets(), subviewOp.getMixedStrides(),
+        subviewOp.getDroppedDims(), op.getSrcIndices(), sourceIndices);
+
+    rewriter.replaceOpWithNewOp<GatherToLDSOp>(
+        op, subviewOp.getSource(), sourceIndices, op.getDst(),
+        op.getDstIndices(), op.getTransferType());
+
+    return success();
+  }
+};
+} // namespace
+
+void mlir::amdgpu::populateAmdgpuFoldSubviewOpsPatterns(
+    RewritePatternSet &patterns, PatternBenefit benefit) {
+  patterns.add<FoldSubviewIntoGatherToLDSOp>(patterns.getContext(), benefit);
+}
diff --git a/mlir/test/Dialect/AMDGPU/amdgpu-fold-subviews.mlir b/mlir/test/Dialect/AMDGPU/amdgpu-fold-subviews.mlir
new file mode 100644
index 0000000000000..d582991c3622f
--- /dev/null
+++ b/mlir/test/Dialect/AMDGPU/amdgpu-fold-subviews.mlir
@@ -0,0 +1,50 @@
+// RUN: mlir-opt -amdgpu-fold-subview-ops -split-input-file %s | FileCheck %s
+
+#gpu_lds_addrspace = 3
+
+// CHECK: func @test_memref
+// CHECK-SAME: %[[ARG0:.*]]: index, %[[ARG1:.*]]: index
+func.func @test_memref(%offset_i: index, %offset_j: index) {
+  // CHECK: %[[C0:.*]] = arith.constant 0 : index
+  // CHECK: %[[LOCAL:.*]] = memref.alloc() : memref<64x64xf16, 3>
+  // CHECK: %[[MEM:.*]] = memref.alloc() : memref<64x128xf16>
+  // CHECK:  %[[MEM]][%arg0, %arg1], %[[LOCAL]][%[[C0]], %[[C0]]]
+  // CHECK-SAME: vector<8xf16>, memref<64x128xf16>, memref<64x64xf16, 3>
+
+  %alloc = memref.alloc() : memref<64x64xf16, #gpu_lds_addrspace>
+  %mem = memref.alloc() : memref<64x128xf16>
+  %subview = memref.subview %mem[0, 0][32, 64][1, 1] : memref<64x128xf16> to memref<32x64xf16, strided<[128, 1]>>
+  %c0 = arith.constant 0 : index
+  amdgpu.gather_to_lds %subview[%offset_i, %offset_j], %alloc[%c0, %c0]
+    : vector<8xf16>, memref<32x64xf16, strided<[128, 1]>>, memref<64x64xf16, #gpu_lds_addrspace>
+  func.return
+}
+
+// -----
+
+#gpu_lds_addrspace = 3
+
+// CHECK: #[[MAP:.*]] = affine_map<()[s0] -> (s0 + 32)>
+// CHECK: #[[MAP1:.*]] = affine_map<()[s0] -> (s0 + 64)>
+
+// CHECK: func @subview_folding_offset
+// CHECK-SAME: %[[ARG0:.*]]: index, %[[ARG1:.*]]: index
+func.func @subview_folding_offset(%offset_i: index, %offset_j: index) {
+  // CHECK: %[[C0:.*]] = arith.constant 0 : index
+  // CHECK: %[[LOCAL:.*]] = memref.alloc() : memref<64x64xf16, 3>
+  // CHECK: %[[MEM:.*]] = memref.alloc() : memref<64x128xf16>
+
+  // CHECK: %[[IDX0:.*]] = affine.apply #[[MAP]]()[%[[ARG0]]]
+  // CHECK: %[[IDX1:.*]] = affine.apply #[[MAP1]]()[%[[ARG1]]]
+
+  // CHECK:  %[[MEM]][%[[IDX0]], %[[IDX1]]], %[[LOCAL]][%[[C0]], %[[C0]]]
+  // CHECK-SAME: vector<8xf16>, memref<64x128xf16>, memref<64x64xf16, 3>
+
+  %alloc = memref.alloc() : memref<64x64xf16, #gpu_lds_addrspace>
+  %mem = memref.alloc() : memref<64x128xf16>
+  %subview = memref.subview %mem[32, 64][32, 64][1, 1] : memref<64x128xf16> to memref<32x64xf16, strided<[128, 1], offset: 4160>>
+  %c0 = arith.constant 0 : index
+  amdgpu.gather_to_lds %subview[%offset_i, %offset_j], %alloc[%c0, %c0]
+    : vector<8xf16>, memref<32x64xf16, strided<[128, 1], offset: 4160>>, memref<64x64xf16, #gpu_lds_addrspace>
+  func.return
+}

@llvmbot
Copy link
Member

llvmbot commented Jul 21, 2025

@llvm/pr-subscribers-mlir-amdgpu

Author: Alan Li (lialan)

Changes

This PR adds a new optimization pass to fold memref.subview operations into amdgpu.gather_to_lds operations, simplifying the overall operation and potentially improving performance. The pass identifies when a GatherToLDSOp has a memref.subview as its source and attempts to fold the subview by adjusting the indices accordingly.

  • Implements a new pass AmdgpuFoldSubviewOpsPass with pattern FoldSubviewIntoGatherToLDSOp
  • Adds corresponding folding test

Full diff: https://github.com/llvm/llvm-project/pull/149851.diff

5 Files Affected:

  • (modified) mlir/include/mlir/Dialect/AMDGPU/Transforms/Passes.h (+5-1)
  • (modified) mlir/include/mlir/Dialect/AMDGPU/Transforms/Passes.td (+12)
  • (modified) mlir/lib/Dialect/AMDGPU/Transforms/CMakeLists.txt (+2-1)
  • (added) mlir/lib/Dialect/AMDGPU/Transforms/FoldSubviewOps.cpp (+67)
  • (added) mlir/test/Dialect/AMDGPU/amdgpu-fold-subviews.mlir (+50)
diff --git a/mlir/include/mlir/Dialect/AMDGPU/Transforms/Passes.h b/mlir/include/mlir/Dialect/AMDGPU/Transforms/Passes.h
index cc2f543e79f69..a61903609aaff 100644
--- a/mlir/include/mlir/Dialect/AMDGPU/Transforms/Passes.h
+++ b/mlir/include/mlir/Dialect/AMDGPU/Transforms/Passes.h
@@ -22,8 +22,9 @@ class ConversionTarget;
 namespace amdgpu {
 
 #define GEN_PASS_DECL_AMDGPUEMULATEATOMICSPASS
-#define GEN_PASS_DECL_AMDGPURESOLVESTRIDEDMETADATAPASS
+#define GEN_PASS_DECL_AMDGPUFOLDSUBVIEWOPSPASS
 #define GEN_PASS_DECL_AMDGPUMASKEDLOADTOLOADPASS
+#define GEN_PASS_DECL_AMDGPURESOLVESTRIDEDMETADATAPASS
 #define GEN_PASS_REGISTRATION
 #include "mlir/Dialect/AMDGPU/Transforms/Passes.h.inc"
 
@@ -38,6 +39,9 @@ void populateAmdgpuResolveStridedMetadataPatterns(RewritePatternSet &patterns,
 void populateAmdgpuMaskedloadToLoadPatterns(RewritePatternSet &patterns,
                                             PatternBenefit benefit = 1);
 
+void populateAmdgpuFoldSubviewOpsPatterns(RewritePatternSet &patterns,
+                                          PatternBenefit benefit = 1);
+
 } // namespace amdgpu
 } // namespace mlir
 
diff --git a/mlir/include/mlir/Dialect/AMDGPU/Transforms/Passes.td b/mlir/include/mlir/Dialect/AMDGPU/Transforms/Passes.td
index 8d0e6829ab0cc..fad939ced9877 100644
--- a/mlir/include/mlir/Dialect/AMDGPU/Transforms/Passes.td
+++ b/mlir/include/mlir/Dialect/AMDGPU/Transforms/Passes.td
@@ -70,4 +70,16 @@ def AmdgpuMaskedloadToLoadPass : Pass<"amdgpu-maskedload-to-load"> {
     "memref::MemRefDialect"
   ];
 }
+
+def AmdgpuFoldSubviewOpsPass : Pass<"amdgpu-fold-subview-ops"> {
+  let summary = "Fold subview operations into their parent operations";
+  let description = [{
+    This pass identifies `memref.subview` sources of `GatherToLDSOp` and
+    attempts to fold the source ops, potentially simplifying the overall
+    operation and improving performance.
+  }];
+  let dependentDialects = [
+    "memref::MemRefDialect"
+  ];
+}
 #endif // MLIR_DIALECT_AMDGPU_TRANSFORMS_PASSES_TD_
diff --git a/mlir/lib/Dialect/AMDGPU/Transforms/CMakeLists.txt b/mlir/lib/Dialect/AMDGPU/Transforms/CMakeLists.txt
index 17bbe54ea6c0c..20621ec0d55a4 100644
--- a/mlir/lib/Dialect/AMDGPU/Transforms/CMakeLists.txt
+++ b/mlir/lib/Dialect/AMDGPU/Transforms/CMakeLists.txt
@@ -1,7 +1,8 @@
 add_mlir_dialect_library(MLIRAMDGPUTransforms
   EmulateAtomics.cpp
-  ResolveStridedMetadata.cpp
+  FoldSubviewOps.cpp
   MaskedloadToLoad.cpp
+  ResolveStridedMetadata.cpp
 
   ADDITIONAL_HEADER_DIRS
   {$MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/AMDGPU/Transforms
diff --git a/mlir/lib/Dialect/AMDGPU/Transforms/FoldSubviewOps.cpp b/mlir/lib/Dialect/AMDGPU/Transforms/FoldSubviewOps.cpp
new file mode 100644
index 0000000000000..adbdf4b856bd5
--- /dev/null
+++ b/mlir/lib/Dialect/AMDGPU/Transforms/FoldSubviewOps.cpp
@@ -0,0 +1,67 @@
+//===- FoldSubviewOps.cpp - AMDGPU fold subview ops ---------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "mlir/Dialect/AMDGPU/Transforms/Passes.h"
+
+#include "mlir/Dialect/AMDGPU/IR/AMDGPUDialect.h"
+#include "mlir/Dialect/Affine/ViewLikeInterfaceUtils.h"
+#include "mlir/Dialect/MemRef/IR/MemRef.h"
+#include "mlir/Transforms/GreedyPatternRewriteDriver.h"
+
+namespace mlir::amdgpu {
+#define GEN_PASS_DEF_AMDGPUFOLDSUBVIEWOPSPASS
+#include "mlir/Dialect/AMDGPU/Transforms/Passes.h.inc"
+} // namespace mlir::amdgpu
+
+using namespace mlir;
+using namespace mlir::amdgpu;
+
+namespace {
+struct AmdgpuFoldSubviewOpsPass
+    : public amdgpu::impl::AmdgpuFoldSubviewOpsPassBase<
+          AmdgpuFoldSubviewOpsPass> {
+  void runOnOperation() override {
+    RewritePatternSet patterns(&getContext());
+    populateAmdgpuFoldSubviewOpsPatterns(patterns);
+    if (failed(applyPatternsGreedily(getOperation(), std::move(patterns))))
+      signalPassFailure();
+  }
+};
+
+struct FoldSubviewIntoGatherToLDSOp : public OpRewritePattern<GatherToLDSOp> {
+  using OpRewritePattern<GatherToLDSOp>::OpRewritePattern;
+  LogicalResult matchAndRewrite(GatherToLDSOp op,
+                                PatternRewriter &rewriter) const override {
+    Location loc = op.getLoc();
+
+    // Check if the source is a subview operation:
+    auto subviewOp = dyn_cast<memref::SubViewOp>(op.getSrc().getDefiningOp());
+    if (!subviewOp)
+      return rewriter.notifyMatchFailure(
+          loc, "GatherToLDSOp folding is currently supported only when the "
+               "source is a SubviewOp. This is one specific pattern, and other "
+               "scenarios may be added in the future.");
+
+    SmallVector<Value> sourceIndices;
+    mlir::affine::resolveIndicesIntoOpWithOffsetsAndStrides(
+        rewriter, loc, subviewOp.getMixedOffsets(), subviewOp.getMixedStrides(),
+        subviewOp.getDroppedDims(), op.getSrcIndices(), sourceIndices);
+
+    rewriter.replaceOpWithNewOp<GatherToLDSOp>(
+        op, subviewOp.getSource(), sourceIndices, op.getDst(),
+        op.getDstIndices(), op.getTransferType());
+
+    return success();
+  }
+};
+} // namespace
+
+void mlir::amdgpu::populateAmdgpuFoldSubviewOpsPatterns(
+    RewritePatternSet &patterns, PatternBenefit benefit) {
+  patterns.add<FoldSubviewIntoGatherToLDSOp>(patterns.getContext(), benefit);
+}
diff --git a/mlir/test/Dialect/AMDGPU/amdgpu-fold-subviews.mlir b/mlir/test/Dialect/AMDGPU/amdgpu-fold-subviews.mlir
new file mode 100644
index 0000000000000..d582991c3622f
--- /dev/null
+++ b/mlir/test/Dialect/AMDGPU/amdgpu-fold-subviews.mlir
@@ -0,0 +1,50 @@
+// RUN: mlir-opt -amdgpu-fold-subview-ops -split-input-file %s | FileCheck %s
+
+#gpu_lds_addrspace = 3
+
+// CHECK: func @test_memref
+// CHECK-SAME: %[[ARG0:.*]]: index, %[[ARG1:.*]]: index
+func.func @test_memref(%offset_i: index, %offset_j: index) {
+  // CHECK: %[[C0:.*]] = arith.constant 0 : index
+  // CHECK: %[[LOCAL:.*]] = memref.alloc() : memref<64x64xf16, 3>
+  // CHECK: %[[MEM:.*]] = memref.alloc() : memref<64x128xf16>
+  // CHECK:  %[[MEM]][%arg0, %arg1], %[[LOCAL]][%[[C0]], %[[C0]]]
+  // CHECK-SAME: vector<8xf16>, memref<64x128xf16>, memref<64x64xf16, 3>
+
+  %alloc = memref.alloc() : memref<64x64xf16, #gpu_lds_addrspace>
+  %mem = memref.alloc() : memref<64x128xf16>
+  %subview = memref.subview %mem[0, 0][32, 64][1, 1] : memref<64x128xf16> to memref<32x64xf16, strided<[128, 1]>>
+  %c0 = arith.constant 0 : index
+  amdgpu.gather_to_lds %subview[%offset_i, %offset_j], %alloc[%c0, %c0]
+    : vector<8xf16>, memref<32x64xf16, strided<[128, 1]>>, memref<64x64xf16, #gpu_lds_addrspace>
+  func.return
+}
+
+// -----
+
+#gpu_lds_addrspace = 3
+
+// CHECK: #[[MAP:.*]] = affine_map<()[s0] -> (s0 + 32)>
+// CHECK: #[[MAP1:.*]] = affine_map<()[s0] -> (s0 + 64)>
+
+// CHECK: func @subview_folding_offset
+// CHECK-SAME: %[[ARG0:.*]]: index, %[[ARG1:.*]]: index
+func.func @subview_folding_offset(%offset_i: index, %offset_j: index) {
+  // CHECK: %[[C0:.*]] = arith.constant 0 : index
+  // CHECK: %[[LOCAL:.*]] = memref.alloc() : memref<64x64xf16, 3>
+  // CHECK: %[[MEM:.*]] = memref.alloc() : memref<64x128xf16>
+
+  // CHECK: %[[IDX0:.*]] = affine.apply #[[MAP]]()[%[[ARG0]]]
+  // CHECK: %[[IDX1:.*]] = affine.apply #[[MAP1]]()[%[[ARG1]]]
+
+  // CHECK:  %[[MEM]][%[[IDX0]], %[[IDX1]]], %[[LOCAL]][%[[C0]], %[[C0]]]
+  // CHECK-SAME: vector<8xf16>, memref<64x128xf16>, memref<64x64xf16, 3>
+
+  %alloc = memref.alloc() : memref<64x64xf16, #gpu_lds_addrspace>
+  %mem = memref.alloc() : memref<64x128xf16>
+  %subview = memref.subview %mem[32, 64][32, 64][1, 1] : memref<64x128xf16> to memref<32x64xf16, strided<[128, 1], offset: 4160>>
+  %c0 = arith.constant 0 : index
+  amdgpu.gather_to_lds %subview[%offset_i, %offset_j], %alloc[%c0, %c0]
+    : vector<8xf16>, memref<32x64xf16, strided<[128, 1], offset: 4160>>, memref<64x64xf16, #gpu_lds_addrspace>
+  func.return
+}

@krzysz00
Copy link
Contributor

High-level comment: this doesn't need to be a new pass. Just go add logic to FoldMemRefAliasOps to handle this one

@krzysz00
Copy link
Contributor

(That is, tentative reject for being overcomplicated)

(This'll all work nicely if I ever have the time to get my interfaces for DMA-like ops and load/store-like ops up, but for now, this goes in with the rest of the subview folding logic)

Copy link
Contributor

@qedawkins qedawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the populate function! We'll want to add FoldMemRefAlias pass variant that run these patterns so we aren't applying them on their own and that'll help.

@lialan
Copy link
Member Author

lialan commented Jul 21, 2025

High-level comment: this doesn't need to be a new pass. Just go add logic to FoldMemRefAliasOps to handle this one

Ahh they have NVGPU patterns in the same file so I think it is okay to just do it there. I will move things there.

@qedawkins
Copy link
Contributor

High-level comment: this doesn't need to be a new pass. Just go add logic to FoldMemRefAliasOps to handle this one

Ahh they have NVGPU patterns in the same file so I think it is okay to just do it there. I will move things there.

No, we should keep it out. The NVGPU patterns should not be there either.

@lialan lialan changed the title [AMDGPU] fold memref.subview into amdgpu.gather_to_lds [AMDGPU] fold memref.subview/expand_shape/collapse_shape into amdgpu.gather_to_lds Jul 23, 2025
Comment on lines 22 to 23
struct AmdgpuFoldMemRefOpsPass
: public amdgpu::impl::AmdgpuFoldSubviewOpsPassBase<
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
struct AmdgpuFoldMemRefOpsPass
: public amdgpu::impl::AmdgpuFoldSubviewOpsPassBase<
struct AmdgpuFoldMemRefOpsPass final
: amdgpu::impl::AmdgpuFoldSubviewOpsPassBase<

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

void runOnOperation() override {
RewritePatternSet patterns(&getContext());
populateAmdgpuFoldSubviewOpsPatterns(patterns);
if (failed(applyPatternsGreedily(getOperation(), std::move(patterns))))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the greedy rewriter or is walkAndApplyPatterns enough?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL. It is a crime that walkAndApplyPatterns is not being used widely.

@lialan lialan requested a review from kuhar July 23, 2025 14:22
@lialan lialan merged commit dbc63f1 into llvm:main Jul 23, 2025
9 checks passed
@lialan lialan deleted the fold_subviews branch July 23, 2025 15:22
@lialan
Copy link
Member Author

lialan commented Jul 23, 2025

... But also, if we're creating a new file etc., can we go ahead and handle memref.expand_shape and memref.collapse_shape folding?

updated as well.

@llvm-ci
Copy link
Collaborator

llvm-ci commented Jul 23, 2025

LLVM Buildbot has detected a new failure on builder mlir-nvidia running on mlir-nvidia while building mlir at step 6 "build-check-mlir-build-only".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/138/builds/16495

Here is the relevant piece of the build log for the reference
Step 6 (build-check-mlir-build-only) failure: build (failure)
...
62.449 [93/7/5243] Linking CXX shared library lib/libMLIRAMDGPUUtils.so.22.0git
62.454 [92/7/5244] Creating library symlink lib/libMLIRAMDGPUUtils.so
62.509 [88/10/5245] Linking CXX shared library lib/libMLIRGPUTransforms.so.22.0git
62.518 [87/10/5246] Creating library symlink lib/libMLIRGPUTransforms.so
62.587 [84/12/5247] Linking CXX shared library lib/libMLIRArithToAMDGPU.so.22.0git
62.588 [83/12/5248] Linking CXX executable tools/mlir/unittests/Dialect/AMDGPU/MLIRAMDGPUTests
62.590 [83/11/5249] Linking CXX shared library lib/libMLIRAMDGPUToROCDL.so.22.0git
62.593 [82/11/5250] Creating library symlink lib/libMLIRArithToAMDGPU.so
62.596 [82/10/5251] Creating library symlink lib/libMLIRAMDGPUToROCDL.so
62.599 [82/9/5252] Linking CXX shared library lib/libMLIRAMDGPUTransforms.so.22.0git
FAILED: lib/libMLIRAMDGPUTransforms.so.22.0git 
: && /usr/bin/clang++ -fPIC -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -Wundef -Werror=mismatched-tags -Werror=global-constructors -O3 -DNDEBUG  -Wl,-z,defs -Wl,-z,nodelete -fuse-ld=lld -Wl,--color-diagnostics   -Wl,--gc-sections -shared -Wl,-soname,libMLIRAMDGPUTransforms.so.22.0git -o lib/libMLIRAMDGPUTransforms.so.22.0git tools/mlir/lib/Dialect/AMDGPU/Transforms/CMakeFiles/obj.MLIRAMDGPUTransforms.dir/EmulateAtomics.cpp.o tools/mlir/lib/Dialect/AMDGPU/Transforms/CMakeFiles/obj.MLIRAMDGPUTransforms.dir/FoldMemRefsOps.cpp.o tools/mlir/lib/Dialect/AMDGPU/Transforms/CMakeFiles/obj.MLIRAMDGPUTransforms.dir/MaskedloadToLoad.cpp.o tools/mlir/lib/Dialect/AMDGPU/Transforms/CMakeFiles/obj.MLIRAMDGPUTransforms.dir/ResolveStridedMetadata.cpp.o  -Wl,-rpath,"\$ORIGIN/../lib:/vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/lib:"  lib/libMLIRAMDGPUUtils.so.22.0git  lib/libMLIRSCFDialect.so.22.0git  lib/libMLIRVectorDialect.so.22.0git  lib/libMLIRControlFlowDialect.so.22.0git  lib/libMLIRFuncDialect.so.22.0git  lib/libMLIRTransforms.so.22.0git  lib/libMLIRTransformUtils.so.22.0git  lib/libMLIRAMDGPUDialect.so.22.0git  lib/libMLIRROCDLDialect.so.22.0git  lib/libMLIRLLVMDialect.so.22.0git  lib/libLLVMBitWriter.so.22.0git  lib/libLLVMBitReader.so.22.0git  lib/libLLVMAsmParser.so.22.0git  lib/libLLVMCore.so.22.0git  lib/libLLVMBinaryFormat.so.22.0git  lib/libMLIRGPUDialect.so.22.0git  lib/libMLIRDLTIDialect.so.22.0git  lib/libMLIRMathDialect.so.22.0git  lib/libMLIRMemRefUtils.so.22.0git  lib/libMLIRTensorDialect.so.22.0git  lib/libMLIRParallelCombiningOpInterface.so.22.0git  lib/libMLIRAffineDialect.so.22.0git  lib/libMLIRMemRefDialect.so.22.0git  lib/libMLIRArithUtils.so.22.0git  lib/libMLIRComplexDialect.so.22.0git  lib/libMLIRArithDialect.so.22.0git  lib/libMLIRCastInterfaces.so.22.0git  lib/libMLIRInferIntRangeCommon.so.22.0git  lib/libMLIRDialect.so.22.0git  lib/libMLIRDialectUtils.so.22.0git  lib/libMLIRShapedOpInterfaces.so.22.0git  lib/libMLIRIndexingMapOpInterface.so.22.0git  lib/libMLIRMaskableOpInterface.so.22.0git  lib/libMLIRMaskingOpInterface.so.22.0git  lib/libMLIRVectorInterfaces.so.22.0git  lib/libMLIRSubsetOpInterface.so.22.0git  lib/libMLIRValueBoundsOpInterface.so.22.0git  lib/libMLIRDestinationStyleOpInterface.so.22.0git  lib/libMLIRRewrite.so.22.0git  lib/libMLIRRewritePDL.so.22.0git  lib/libMLIRPDLToPDLInterp.so.22.0git  lib/libMLIRPass.so.22.0git  lib/libMLIRPDLInterpDialect.so.22.0git  lib/libMLIRPDLDialect.so.22.0git  lib/libMLIRUBDialect.so.22.0git  lib/libMLIRMemorySlotInterfaces.so.22.0git  lib/libMLIRAnalysis.so.22.0git  lib/libMLIRSideEffectInterfaces.so.22.0git  lib/libMLIRInferIntRangeInterface.so.22.0git  lib/libMLIRInferTypeOpInterface.so.22.0git  lib/libMLIRControlFlowInterfaces.so.22.0git  lib/libMLIRViewLikeInterface.so.22.0git  lib/libMLIRLoopLikeInterface.so.22.0git  lib/libMLIRFunctionInterfaces.so.22.0git  lib/libMLIRDataLayoutInterfaces.so.22.0git  lib/libMLIRCallInterfaces.so.22.0git  lib/libMLIRPresburger.so.22.0git  lib/libMLIRRuntimeVerifiableOpInterface.so.22.0git  lib/libMLIRIR.so.22.0git  lib/libMLIRSupport.so.22.0git  lib/libLLVMSupport.so.22.0git  -Wl,-rpath-link,/vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/lib && :
ld.lld: error: undefined symbol: mlir::affine::resolveIndicesIntoOpWithOffsetsAndStrides(mlir::RewriterBase&, mlir::Location, llvm::ArrayRef<mlir::OpFoldResult>, llvm::ArrayRef<mlir::OpFoldResult>, llvm::SmallBitVector const&, llvm::ArrayRef<mlir::OpFoldResult>, llvm::SmallVectorImpl<mlir::Value>&)
>>> referenced by FoldMemRefsOps.cpp
>>>               tools/mlir/lib/Dialect/AMDGPU/Transforms/CMakeFiles/obj.MLIRAMDGPUTransforms.dir/FoldMemRefsOps.cpp.o:(mlir::amdgpu::FoldMemRefOpsIntoGatherToLDSOp::matchAndRewrite(mlir::amdgpu::GatherToLDSOp, mlir::PatternRewriter&) const::'lambda'(mlir::memref::SubViewOp)::operator()(mlir::memref::SubViewOp) const)
clang: error: linker command failed with exit code 1 (use -v to see invocation)
62.695 [82/8/5253] Linking CXX shared library lib/libMLIRSCFToGPU.so.22.0git
62.706 [82/7/5254] Linking CXX shared library lib/libMLIRLinalgTestPasses.so.22.0git
62.711 [82/6/5255] Linking CXX shared library lib/libMLIRGPUToGPURuntimeTransforms.so.22.0git
80.538 [82/5/5256] Building CXX object tools/mlir/lib/CAPI/RegisterEverything/CMakeFiles/obj.MLIRCAPIRegisterEverything.dir/RegisterEverything.cpp.o
80.939 [82/4/5257] Building CXX object tools/mlir/tools/mlir-reduce/CMakeFiles/mlir-reduce.dir/mlir-reduce.cpp.o
81.978 [82/3/5258] Building CXX object tools/mlir/tools/mlir-opt/CMakeFiles/MLIRMlirOptMain.dir/mlir-opt.cpp.o
83.872 [82/2/5259] Building CXX object tools/mlir/tools/mlir-opt/CMakeFiles/mlir-opt.dir/mlir-opt.cpp.o
85.428 [82/1/5260] Building CXX object tools/mlir/examples/transform-opt/CMakeFiles/mlir-transform-opt.dir/mlir-transform-opt.cpp.o
ninja: build stopped: subcommand failed.

@llvm-ci
Copy link
Collaborator

llvm-ci commented Jul 23, 2025

LLVM Buildbot has detected a new failure on builder flang-aarch64-sharedlibs running on linaro-flang-aarch64-sharedlibs while building mlir at step 5 "build-unified-tree".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/80/builds/14755

Here is the relevant piece of the build log for the reference
Step 5 (build-unified-tree) failure: build (failure)
...
115.113 [1120/32/6612] Building CXX object tools/flang/lib/Optimizer/Builder/CMakeFiles/FIRBuilder.dir/Runtime/Character.cpp.o
115.117 [1120/31/6613] Building CXX object tools/flang/lib/Optimizer/Builder/CMakeFiles/FIRBuilder.dir/Runtime/EnvironmentDefaults.cpp.o
115.120 [1120/30/6614] Linking CXX shared library lib/libMLIRArithToAMDGPU.so.22.0git
115.122 [1120/29/6615] Linking CXX executable bin/clang-linker-wrapper
115.125 [1120/28/6616] Building CXX object tools/flang/lib/Optimizer/Builder/CMakeFiles/FIRBuilder.dir/PPCIntrinsicCall.cpp.o
115.128 [1120/27/6617] Building CXX object tools/flang/lib/Optimizer/Builder/CMakeFiles/FIRBuilder.dir/Runtime/Command.cpp.o
115.131 [1120/26/6618] Building CXX object tools/flang/lib/Optimizer/Builder/CMakeFiles/FIRBuilder.dir/Runtime/Exceptions.cpp.o
115.132 [1120/25/6619] Building CXX object tools/flang/lib/Optimizer/Builder/CMakeFiles/FIRBuilder.dir/MutableBox.cpp.o
115.133 [1120/24/6620] Building CXX object tools/flang/lib/Optimizer/Builder/CMakeFiles/FIRBuilder.dir/Runtime/CUDA/Descriptor.cpp.o
115.145 [1120/23/6621] Linking CXX shared library lib/libMLIRAMDGPUTransforms.so.22.0git
FAILED: lib/libMLIRAMDGPUTransforms.so.22.0git 
: && /usr/local/bin/c++ -fPIC -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -Wundef -Werror=mismatched-tags -Werror=global-constructors -O3 -DNDEBUG  -Wl,-z,defs -Wl,-z,nodelete   -Wl,-rpath-link,/home/tcwg-buildbot/worker/flang-aarch64-sharedlibs/build/./lib  -Wl,--gc-sections -shared -Wl,-soname,libMLIRAMDGPUTransforms.so.22.0git -o lib/libMLIRAMDGPUTransforms.so.22.0git tools/mlir/lib/Dialect/AMDGPU/Transforms/CMakeFiles/obj.MLIRAMDGPUTransforms.dir/EmulateAtomics.cpp.o tools/mlir/lib/Dialect/AMDGPU/Transforms/CMakeFiles/obj.MLIRAMDGPUTransforms.dir/FoldMemRefsOps.cpp.o tools/mlir/lib/Dialect/AMDGPU/Transforms/CMakeFiles/obj.MLIRAMDGPUTransforms.dir/MaskedloadToLoad.cpp.o tools/mlir/lib/Dialect/AMDGPU/Transforms/CMakeFiles/obj.MLIRAMDGPUTransforms.dir/ResolveStridedMetadata.cpp.o  -Wl,-rpath,"\$ORIGIN/../lib:/home/tcwg-buildbot/worker/flang-aarch64-sharedlibs/build/lib:"  lib/libMLIRAMDGPUUtils.so.22.0git  lib/libMLIRSCFDialect.so.22.0git  lib/libMLIRVectorDialect.so.22.0git  lib/libMLIRControlFlowDialect.so.22.0git  lib/libMLIRFuncDialect.so.22.0git  lib/libMLIRTransforms.so.22.0git  lib/libMLIRTransformUtils.so.22.0git  lib/libMLIRAMDGPUDialect.so.22.0git  lib/libMLIRROCDLDialect.so.22.0git  lib/libMLIRLLVMDialect.so.22.0git  lib/libLLVMBitWriter.so.22.0git  lib/libLLVMBitReader.so.22.0git  lib/libLLVMAsmParser.so.22.0git  lib/libLLVMCore.so.22.0git  lib/libLLVMBinaryFormat.so.22.0git  lib/libMLIRGPUDialect.so.22.0git  lib/libMLIRDLTIDialect.so.22.0git  lib/libMLIRMathDialect.so.22.0git  lib/libMLIRMemRefUtils.so.22.0git  lib/libMLIRTensorDialect.so.22.0git  lib/libMLIRParallelCombiningOpInterface.so.22.0git  lib/libMLIRAffineDialect.so.22.0git  lib/libMLIRMemRefDialect.so.22.0git  lib/libMLIRArithUtils.so.22.0git  lib/libMLIRComplexDialect.so.22.0git  lib/libMLIRArithDialect.so.22.0git  lib/libMLIRCastInterfaces.so.22.0git  lib/libMLIRInferIntRangeCommon.so.22.0git  lib/libMLIRDialect.so.22.0git  lib/libMLIRDialectUtils.so.22.0git  lib/libMLIRShapedOpInterfaces.so.22.0git  lib/libMLIRIndexingMapOpInterface.so.22.0git  lib/libMLIRMaskableOpInterface.so.22.0git  lib/libMLIRMaskingOpInterface.so.22.0git  lib/libMLIRVectorInterfaces.so.22.0git  lib/libMLIRSubsetOpInterface.so.22.0git  lib/libMLIRValueBoundsOpInterface.so.22.0git  lib/libMLIRDestinationStyleOpInterface.so.22.0git  lib/libMLIRRewrite.so.22.0git  lib/libMLIRRewritePDL.so.22.0git  lib/libMLIRPDLToPDLInterp.so.22.0git  lib/libMLIRPass.so.22.0git  lib/libMLIRPDLInterpDialect.so.22.0git  lib/libMLIRPDLDialect.so.22.0git  lib/libMLIRUBDialect.so.22.0git  lib/libMLIRMemorySlotInterfaces.so.22.0git  lib/libMLIRAnalysis.so.22.0git  lib/libMLIRSideEffectInterfaces.so.22.0git  lib/libMLIRInferIntRangeInterface.so.22.0git  lib/libMLIRInferTypeOpInterface.so.22.0git  lib/libMLIRControlFlowInterfaces.so.22.0git  lib/libMLIRViewLikeInterface.so.22.0git  lib/libMLIRLoopLikeInterface.so.22.0git  lib/libMLIRFunctionInterfaces.so.22.0git  lib/libMLIRDataLayoutInterfaces.so.22.0git  lib/libMLIRCallInterfaces.so.22.0git  lib/libMLIRPresburger.so.22.0git  lib/libMLIRRuntimeVerifiableOpInterface.so.22.0git  lib/libMLIRIR.so.22.0git  lib/libMLIRSupport.so.22.0git  lib/libLLVMSupport.so.22.0git  -Wl,-rpath-link,/home/tcwg-buildbot/worker/flang-aarch64-sharedlibs/build/lib && :
/usr/bin/ld: tools/mlir/lib/Dialect/AMDGPU/Transforms/CMakeFiles/obj.MLIRAMDGPUTransforms.dir/FoldMemRefsOps.cpp.o: in function `mlir::amdgpu::FoldMemRefOpsIntoGatherToLDSOp::matchAndRewrite(mlir::amdgpu::GatherToLDSOp, mlir::PatternRewriter&) const::{lambda(mlir::memref::SubViewOp)#1}::operator()(mlir::memref::SubViewOp) const':
FoldMemRefsOps.cpp:(.text._ZZNK4mlir6amdgpu30FoldMemRefOpsIntoGatherToLDSOp15matchAndRewriteENS0_13GatherToLDSOpERNS_15PatternRewriterEENKUlNS_6memref9SubViewOpEE_clES6_[_ZZNK4mlir6amdgpu30FoldMemRefOpsIntoGatherToLDSOp15matchAndRewriteENS0_13GatherToLDSOpERNS_15PatternRewriterEENKUlNS_6memref9SubViewOpEE_clES6_]+0xe0): undefined reference to `mlir::affine::resolveIndicesIntoOpWithOffsetsAndStrides(mlir::RewriterBase&, mlir::Location, llvm::ArrayRef<mlir::OpFoldResult>, llvm::ArrayRef<mlir::OpFoldResult>, llvm::SmallBitVector const&, llvm::ArrayRef<mlir::OpFoldResult>, llvm::SmallVectorImpl<mlir::Value>&)'
clang++: error: linker command failed with exit code 1 (use -v to see invocation)
115.162 [1120/22/6622] Linking CXX shared library lib/libMLIRTargetLLVM.so.22.0git
115.180 [1120/21/6623] Linking CXX shared library lib/libMLIRArmSMEToLLVM.so.22.0git
115.216 [1120/20/6624] Linking CXX shared library lib/libMLIRTestTransformDialect.so.22.0git
115.351 [1120/19/6625] Linking CXX shared library lib/libMLIRAsyncTransforms.so.22.0git
115.359 [1120/18/6626] Linking CXX shared library lib/libMLIRSCFToSPIRV.so.22.0git
115.379 [1120/17/6627] Linking CXX shared library lib/libMLIRTensorToSPIRV.so.22.0git
115.386 [1120/16/6628] Linking CXX executable bin/llvm-lto
115.423 [1120/15/6629] Linking CXX shared library lib/libMLIRSCFTransformOps.so.22.0git
115.457 [1120/14/6630] Linking CXX shared library lib/libMLIRVectorToLLVMPass.so.22.0git
115.463 [1120/13/6631] Linking CXX shared library lib/libMLIRExecutionEngine.so.22.0git
115.508 [1120/12/6632] Linking CXX shared library lib/libMLIRVectorTransformOps.so.22.0git
115.529 [1120/11/6633] Linking CXX shared library lib/libMLIRToLLVMIRTranslationRegistration.so.22.0git
115.559 [1120/10/6634] Linking CXX shared library lib/libMLIRTestVectorToSPIRV.so.22.0git
115.574 [1120/9/6635] Building CXX object tools/flang/lib/FrontendTool/CMakeFiles/flangFrontendTool.dir/ExecuteCompilerInvocation.cpp.o
115.862 [1120/8/6636] Linking CXX shared library lib/libclangAST.so.22.0git
116.131 [1120/7/6637] Building CXX object tools/flang/lib/Optimizer/Builder/CMakeFiles/FIRBuilder.dir/IntrinsicCall.cpp.o
116.214 [1120/6/6638] Building CXX object tools/flang/lib/Support/CMakeFiles/FortranSupport.dir/Version.cpp.o
142.689 [1120/5/6639] Building CXX object tools/flang/lib/Parser/CMakeFiles/FortranParser.dir/cmake_pch.hxx.pch
159.002 [1120/4/6640] Building CXX object tools/flang/lib/Evaluate/CMakeFiles/FortranEvaluate.dir/cmake_pch.hxx.pch
187.806 [1120/3/6641] Building CXX object tools/flang/lib/Semantics/CMakeFiles/FortranSemantics.dir/cmake_pch.hxx.pch
214.744 [1120/2/6642] Building CXX object tools/flang/lib/Frontend/CMakeFiles/flangFrontend.dir/cmake_pch.hxx.pch
310.903 [1120/1/6643] Building CXX object tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/cmake_pch.hxx.pch
ninja: build stopped: subcommand failed.

@llvm-ci
Copy link
Collaborator

llvm-ci commented Jul 23, 2025

LLVM Buildbot has detected a new failure on builder amdgpu-offload-rhel-8-cmake-build-only running on rocm-docker-rhel-8 while building mlir at step 4 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/204/builds/16486

Here is the relevant piece of the build log for the reference
Step 4 (annotate) failure: '../llvm-zorg/zorg/buildbot/builders/annotated/amdgpu-offload-cmake.py --jobs=32' (failure)
...
[7327/7955] Linking CXX shared library lib/libMLIRAMDGPUToROCDL.so.22.0git
[7328/7955] Creating library symlink lib/libMLIRAMDGPUToROCDL.so
[7329/7955] Linking CXX shared library lib/libMLIRArithToAMDGPU.so.22.0git
[7330/7955] Creating library symlink lib/libMLIRArithToAMDGPU.so
[7331/7955] Linking CXX shared library lib/libMLIRGPUTransforms.so.22.0git
[7332/7955] Creating library symlink lib/libMLIRGPUTransforms.so
[7333/7955] Building CXX object tools/flang/lib/Optimizer/Builder/CMakeFiles/FIRBuilder.dir/Runtime/Assign.cpp.o
[7334/7955] Linking CXX shared library lib/libMLIRCAPIAMDGPU.so.22.0git
[7335/7955] Creating library symlink lib/libMLIRCAPIAMDGPU.so
[7336/7955] Linking CXX shared library lib/libMLIRAMDGPUTransforms.so.22.0git
FAILED: lib/libMLIRAMDGPUTransforms.so.22.0git 
: && /usr/bin/c++ -fPIC -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wno-uninitialized -Wno-nonnull -Wno-class-memaccess -Wno-noexcept-type -Wdelete-non-virtual-dtor -Wno-comment -Wno-misleading-indentation -fdiagnostics-color -ffunction-sections -fdata-sections -Wundef -Wno-unused-but-set-parameter -Wno-deprecated-copy -O3 -DNDEBUG  -Wl,-z,defs -Wl,-z,nodelete   -Wl,-rpath-link,/home/botworker/bbot/amdgpu-offload-rhel-8-cmake-build-only/build/./lib  -Wl,--gc-sections -shared -Wl,-soname,libMLIRAMDGPUTransforms.so.22.0git -o lib/libMLIRAMDGPUTransforms.so.22.0git tools/mlir/lib/Dialect/AMDGPU/Transforms/CMakeFiles/obj.MLIRAMDGPUTransforms.dir/EmulateAtomics.cpp.o tools/mlir/lib/Dialect/AMDGPU/Transforms/CMakeFiles/obj.MLIRAMDGPUTransforms.dir/FoldMemRefsOps.cpp.o tools/mlir/lib/Dialect/AMDGPU/Transforms/CMakeFiles/obj.MLIRAMDGPUTransforms.dir/MaskedloadToLoad.cpp.o tools/mlir/lib/Dialect/AMDGPU/Transforms/CMakeFiles/obj.MLIRAMDGPUTransforms.dir/ResolveStridedMetadata.cpp.o  -Wl,-rpath,"\$ORIGIN/../lib:/home/botworker/bbot/amdgpu-offload-rhel-8-cmake-build-only/build/lib:"  lib/libMLIRAMDGPUUtils.so.22.0git  lib/libMLIRSCFDialect.so.22.0git  lib/libMLIRVectorDialect.so.22.0git  lib/libMLIRControlFlowDialect.so.22.0git  lib/libMLIRFuncDialect.so.22.0git  lib/libMLIRTransforms.so.22.0git  lib/libMLIRTransformUtils.so.22.0git  lib/libMLIRAMDGPUDialect.so.22.0git  lib/libMLIRROCDLDialect.so.22.0git  lib/libMLIRLLVMDialect.so.22.0git  lib/libLLVMBitWriter.so.22.0git  lib/libLLVMBitReader.so.22.0git  lib/libLLVMAsmParser.so.22.0git  lib/libLLVMCore.so.22.0git  lib/libLLVMBinaryFormat.so.22.0git  lib/libMLIRGPUDialect.so.22.0git  lib/libMLIRDLTIDialect.so.22.0git  lib/libMLIRMathDialect.so.22.0git  lib/libMLIRMemRefUtils.so.22.0git  lib/libMLIRTensorDialect.so.22.0git  lib/libMLIRParallelCombiningOpInterface.so.22.0git  lib/libMLIRAffineDialect.so.22.0git  lib/libMLIRMemRefDialect.so.22.0git  lib/libMLIRArithUtils.so.22.0git  lib/libMLIRComplexDialect.so.22.0git  lib/libMLIRArithDialect.so.22.0git  lib/libMLIRCastInterfaces.so.22.0git  lib/libMLIRInferIntRangeCommon.so.22.0git  lib/libMLIRDialect.so.22.0git  lib/libMLIRDialectUtils.so.22.0git  lib/libMLIRShapedOpInterfaces.so.22.0git  lib/libMLIRIndexingMapOpInterface.so.22.0git  lib/libMLIRMaskableOpInterface.so.22.0git  lib/libMLIRMaskingOpInterface.so.22.0git  lib/libMLIRVectorInterfaces.so.22.0git  lib/libMLIRSubsetOpInterface.so.22.0git  lib/libMLIRValueBoundsOpInterface.so.22.0git  lib/libMLIRDestinationStyleOpInterface.so.22.0git  lib/libMLIRRewrite.so.22.0git  lib/libMLIRRewritePDL.so.22.0git  lib/libMLIRPDLToPDLInterp.so.22.0git  lib/libMLIRPass.so.22.0git  lib/libMLIRPDLInterpDialect.so.22.0git  lib/libMLIRPDLDialect.so.22.0git  lib/libMLIRUBDialect.so.22.0git  lib/libMLIRMemorySlotInterfaces.so.22.0git  lib/libMLIRAnalysis.so.22.0git  lib/libMLIRSideEffectInterfaces.so.22.0git  lib/libMLIRInferIntRangeInterface.so.22.0git  lib/libMLIRInferTypeOpInterface.so.22.0git  lib/libMLIRControlFlowInterfaces.so.22.0git  lib/libMLIRViewLikeInterface.so.22.0git  lib/libMLIRLoopLikeInterface.so.22.0git  lib/libMLIRFunctionInterfaces.so.22.0git  lib/libMLIRDataLayoutInterfaces.so.22.0git  lib/libMLIRCallInterfaces.so.22.0git  lib/libMLIRPresburger.so.22.0git  lib/libMLIRRuntimeVerifiableOpInterface.so.22.0git  lib/libMLIRIR.so.22.0git  lib/libMLIRSupport.so.22.0git  -lpthread  lib/libLLVMSupport.so.22.0git  -Wl,-rpath-link,/home/botworker/bbot/amdgpu-offload-rhel-8-cmake-build-only/build/lib && :
tools/mlir/lib/Dialect/AMDGPU/Transforms/CMakeFiles/obj.MLIRAMDGPUTransforms.dir/FoldMemRefsOps.cpp.o: In function `mlir::amdgpu::FoldMemRefOpsIntoGatherToLDSOp::matchAndRewrite(mlir::amdgpu::GatherToLDSOp, mlir::PatternRewriter&) const [clone .constprop.183]':
FoldMemRefsOps.cpp:(.text._ZNK4mlir6amdgpu30FoldMemRefOpsIntoGatherToLDSOp15matchAndRewriteENS0_13GatherToLDSOpERNS_15PatternRewriterE.constprop.183[_ZNK4mlir6detail31OpOrInterfaceRewritePatternBaseINS_6amdgpu13GatherToLDSOpEE15matchAndRewriteEPNS_9OperationERNS_15PatternRewriterE]+0x2de): undefined reference to `mlir::affine::resolveIndicesIntoOpWithOffsetsAndStrides(mlir::RewriterBase&, mlir::Location, llvm::ArrayRef<mlir::OpFoldResult>, llvm::ArrayRef<mlir::OpFoldResult>, llvm::SmallBitVector const&, llvm::ArrayRef<mlir::OpFoldResult>, llvm::SmallVectorImpl<mlir::Value>&)'
tools/mlir/lib/Dialect/AMDGPU/Transforms/CMakeFiles/obj.MLIRAMDGPUTransforms.dir/FoldMemRefsOps.cpp.o: In function `mlir::amdgpu::FoldMemRefOpsIntoGatherToLDSOp::matchAndRewrite(mlir::amdgpu::GatherToLDSOp, mlir::PatternRewriter&) const':
FoldMemRefsOps.cpp:(.text._ZNK4mlir6amdgpu30FoldMemRefOpsIntoGatherToLDSOp15matchAndRewriteENS0_13GatherToLDSOpERNS_15PatternRewriterE[_ZNK4mlir6amdgpu30FoldMemRefOpsIntoGatherToLDSOp15matchAndRewriteENS0_13GatherToLDSOpERNS_15PatternRewriterE]+0x2de): undefined reference to `mlir::affine::resolveIndicesIntoOpWithOffsetsAndStrides(mlir::RewriterBase&, mlir::Location, llvm::ArrayRef<mlir::OpFoldResult>, llvm::ArrayRef<mlir::OpFoldResult>, llvm::SmallBitVector const&, llvm::ArrayRef<mlir::OpFoldResult>, llvm::SmallVectorImpl<mlir::Value>&)'
collect2: error: ld returned 1 exit status
[7337/7955] Building CXX object tools/flang/lib/Optimizer/Builder/CMakeFiles/FIRBuilder.dir/MutableBox.cpp.o
[7338/7955] Linking CXX shared library lib/libMLIRGPUToGPURuntimeTransforms.so.22.0git
[7339/7955] Linking CXX shared library lib/libMLIRSCFToGPU.so.22.0git
[7340/7955] Linking CXX shared library lib/libMLIRCAPIGPU.so.22.0git
[7341/7955] Building CXX object tools/flang/lib/Optimizer/Analysis/CMakeFiles/FIRAnalysis.dir/TBAAForest.cpp.o
[7342/7955] Building CXX object tools/flang/lib/Optimizer/Builder/CMakeFiles/FIRBuilder.dir/Character.cpp.o
[7343/7955] Building CXX object tools/flang/unittests/Evaluate/CMakeFiles/intrinsics.test.dir/intrinsics.cpp.o
[7344/7955] Building CXX object tools/flang/lib/Optimizer/Builder/CMakeFiles/FIRBuilder.dir/Runtime/Character.cpp.o
[7345/7955] Building CXX object tools/flang/lib/Optimizer/Builder/CMakeFiles/FIRBuilder.dir/Runtime/Command.cpp.o
[7346/7955] Linking CXX shared library lib/libclang-cpp.so.22.0git
[7347/7955] Building CXX object tools/flang/lib/Evaluate/CMakeFiles/FortranEvaluate.dir/cmake_pch.hxx.gch
[7348/7955] Building CXX object tools/flang/lib/Optimizer/Builder/CMakeFiles/FIRBuilder.dir/Runtime/CUDA/Descriptor.cpp.o
[7349/7955] Building CXX object tools/flang/lib/Optimizer/Builder/CMakeFiles/FIRBuilder.dir/Runtime/Derived.cpp.o
[7350/7955] Building CXX object tools/flang/lib/Optimizer/Builder/CMakeFiles/FIRBuilder.dir/Runtime/EnvironmentDefaults.cpp.o
[7351/7955] Building CXX object tools/flang/lib/Optimizer/Analysis/CMakeFiles/FIRAnalysis.dir/AliasAnalysis.cpp.o
[7352/7955] Building CXX object tools/flang/lib/Optimizer/Builder/CMakeFiles/FIRBuilder.dir/HLFIRTools.cpp.o
[7353/7955] Building CXX object tools/flang/lib/Parser/CMakeFiles/FortranParser.dir/cmake_pch.hxx.gch
[7354/7955] Building CXX object tools/flang/lib/Optimizer/Builder/CMakeFiles/FIRBuilder.dir/FIRBuilder.cpp.o
[7355/7955] Building CXX object tools/flang/unittests/Evaluate/CMakeFiles/expression.test.dir/expression.cpp.o
[7356/7955] Building CXX object tools/flang/lib/Optimizer/Builder/CMakeFiles/FIRBuilder.dir/CUFCommon.cpp.o
[7357/7955] Building CXX object tools/flang/unittests/Evaluate/CMakeFiles/folding.test.dir/folding.cpp.o
[7358/7955] Building CXX object tools/flang/lib/FrontendTool/CMakeFiles/flangFrontendTool.dir/ExecuteCompilerInvocation.cpp.o
[7359/7955] Building CXX object tools/flang/lib/Semantics/CMakeFiles/FortranSemantics.dir/cmake_pch.hxx.gch
[7360/7955] Building CXX object tools/mlir/tools/mlir-reduce/CMakeFiles/mlir-reduce.dir/mlir-reduce.cpp.o
[7361/7955] Building CXX object tools/mlir/lib/CAPI/RegisterEverything/CMakeFiles/obj.MLIRCAPIRegisterEverything.dir/RegisterEverything.cpp.o
[7362/7955] Building CXX object tools/mlir/tools/mlir-opt/CMakeFiles/MLIRMlirOptMain.dir/mlir-opt.cpp.o
[7363/7955] Building CXX object tools/mlir/tools/mlir-opt/CMakeFiles/mlir-opt.dir/mlir-opt.cpp.o
[7364/7955] Building CXX object tools/mlir/examples/transform-opt/CMakeFiles/mlir-transform-opt.dir/mlir-transform-opt.cpp.o
[7365/7955] Building CXX object tools/flang/tools/f18-parse-demo/CMakeFiles/f18-parse-demo.dir/f18-parse-demo.cpp.o
[7366/7955] Building CXX object tools/flang/lib/Optimizer/Builder/CMakeFiles/FIRBuilder.dir/PPCIntrinsicCall.cpp.o
[7367/7955] Building CXX object tools/flang/lib/Optimizer/Builder/CMakeFiles/FIRBuilder.dir/IntrinsicCall.cpp.o
[7368/7955] Building CXX object tools/flang/lib/Frontend/CMakeFiles/flangFrontend.dir/cmake_pch.hxx.gch
Step 7 (build cmake config) failure: build cmake config (failure)
...
[7327/7955] Linking CXX shared library lib/libMLIRAMDGPUToROCDL.so.22.0git
[7328/7955] Creating library symlink lib/libMLIRAMDGPUToROCDL.so
[7329/7955] Linking CXX shared library lib/libMLIRArithToAMDGPU.so.22.0git
[7330/7955] Creating library symlink lib/libMLIRArithToAMDGPU.so
[7331/7955] Linking CXX shared library lib/libMLIRGPUTransforms.so.22.0git
[7332/7955] Creating library symlink lib/libMLIRGPUTransforms.so
[7333/7955] Building CXX object tools/flang/lib/Optimizer/Builder/CMakeFiles/FIRBuilder.dir/Runtime/Assign.cpp.o
[7334/7955] Linking CXX shared library lib/libMLIRCAPIAMDGPU.so.22.0git
[7335/7955] Creating library symlink lib/libMLIRCAPIAMDGPU.so
[7336/7955] Linking CXX shared library lib/libMLIRAMDGPUTransforms.so.22.0git
FAILED: lib/libMLIRAMDGPUTransforms.so.22.0git 
: && /usr/bin/c++ -fPIC -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wno-uninitialized -Wno-nonnull -Wno-class-memaccess -Wno-noexcept-type -Wdelete-non-virtual-dtor -Wno-comment -Wno-misleading-indentation -fdiagnostics-color -ffunction-sections -fdata-sections -Wundef -Wno-unused-but-set-parameter -Wno-deprecated-copy -O3 -DNDEBUG  -Wl,-z,defs -Wl,-z,nodelete   -Wl,-rpath-link,/home/botworker/bbot/amdgpu-offload-rhel-8-cmake-build-only/build/./lib  -Wl,--gc-sections -shared -Wl,-soname,libMLIRAMDGPUTransforms.so.22.0git -o lib/libMLIRAMDGPUTransforms.so.22.0git tools/mlir/lib/Dialect/AMDGPU/Transforms/CMakeFiles/obj.MLIRAMDGPUTransforms.dir/EmulateAtomics.cpp.o tools/mlir/lib/Dialect/AMDGPU/Transforms/CMakeFiles/obj.MLIRAMDGPUTransforms.dir/FoldMemRefsOps.cpp.o tools/mlir/lib/Dialect/AMDGPU/Transforms/CMakeFiles/obj.MLIRAMDGPUTransforms.dir/MaskedloadToLoad.cpp.o tools/mlir/lib/Dialect/AMDGPU/Transforms/CMakeFiles/obj.MLIRAMDGPUTransforms.dir/ResolveStridedMetadata.cpp.o  -Wl,-rpath,"\$ORIGIN/../lib:/home/botworker/bbot/amdgpu-offload-rhel-8-cmake-build-only/build/lib:"  lib/libMLIRAMDGPUUtils.so.22.0git  lib/libMLIRSCFDialect.so.22.0git  lib/libMLIRVectorDialect.so.22.0git  lib/libMLIRControlFlowDialect.so.22.0git  lib/libMLIRFuncDialect.so.22.0git  lib/libMLIRTransforms.so.22.0git  lib/libMLIRTransformUtils.so.22.0git  lib/libMLIRAMDGPUDialect.so.22.0git  lib/libMLIRROCDLDialect.so.22.0git  lib/libMLIRLLVMDialect.so.22.0git  lib/libLLVMBitWriter.so.22.0git  lib/libLLVMBitReader.so.22.0git  lib/libLLVMAsmParser.so.22.0git  lib/libLLVMCore.so.22.0git  lib/libLLVMBinaryFormat.so.22.0git  lib/libMLIRGPUDialect.so.22.0git  lib/libMLIRDLTIDialect.so.22.0git  lib/libMLIRMathDialect.so.22.0git  lib/libMLIRMemRefUtils.so.22.0git  lib/libMLIRTensorDialect.so.22.0git  lib/libMLIRParallelCombiningOpInterface.so.22.0git  lib/libMLIRAffineDialect.so.22.0git  lib/libMLIRMemRefDialect.so.22.0git  lib/libMLIRArithUtils.so.22.0git  lib/libMLIRComplexDialect.so.22.0git  lib/libMLIRArithDialect.so.22.0git  lib/libMLIRCastInterfaces.so.22.0git  lib/libMLIRInferIntRangeCommon.so.22.0git  lib/libMLIRDialect.so.22.0git  lib/libMLIRDialectUtils.so.22.0git  lib/libMLIRShapedOpInterfaces.so.22.0git  lib/libMLIRIndexingMapOpInterface.so.22.0git  lib/libMLIRMaskableOpInterface.so.22.0git  lib/libMLIRMaskingOpInterface.so.22.0git  lib/libMLIRVectorInterfaces.so.22.0git  lib/libMLIRSubsetOpInterface.so.22.0git  lib/libMLIRValueBoundsOpInterface.so.22.0git  lib/libMLIRDestinationStyleOpInterface.so.22.0git  lib/libMLIRRewrite.so.22.0git  lib/libMLIRRewritePDL.so.22.0git  lib/libMLIRPDLToPDLInterp.so.22.0git  lib/libMLIRPass.so.22.0git  lib/libMLIRPDLInterpDialect.so.22.0git  lib/libMLIRPDLDialect.so.22.0git  lib/libMLIRUBDialect.so.22.0git  lib/libMLIRMemorySlotInterfaces.so.22.0git  lib/libMLIRAnalysis.so.22.0git  lib/libMLIRSideEffectInterfaces.so.22.0git  lib/libMLIRInferIntRangeInterface.so.22.0git  lib/libMLIRInferTypeOpInterface.so.22.0git  lib/libMLIRControlFlowInterfaces.so.22.0git  lib/libMLIRViewLikeInterface.so.22.0git  lib/libMLIRLoopLikeInterface.so.22.0git  lib/libMLIRFunctionInterfaces.so.22.0git  lib/libMLIRDataLayoutInterfaces.so.22.0git  lib/libMLIRCallInterfaces.so.22.0git  lib/libMLIRPresburger.so.22.0git  lib/libMLIRRuntimeVerifiableOpInterface.so.22.0git  lib/libMLIRIR.so.22.0git  lib/libMLIRSupport.so.22.0git  -lpthread  lib/libLLVMSupport.so.22.0git  -Wl,-rpath-link,/home/botworker/bbot/amdgpu-offload-rhel-8-cmake-build-only/build/lib && :
tools/mlir/lib/Dialect/AMDGPU/Transforms/CMakeFiles/obj.MLIRAMDGPUTransforms.dir/FoldMemRefsOps.cpp.o: In function `mlir::amdgpu::FoldMemRefOpsIntoGatherToLDSOp::matchAndRewrite(mlir::amdgpu::GatherToLDSOp, mlir::PatternRewriter&) const [clone .constprop.183]':
FoldMemRefsOps.cpp:(.text._ZNK4mlir6amdgpu30FoldMemRefOpsIntoGatherToLDSOp15matchAndRewriteENS0_13GatherToLDSOpERNS_15PatternRewriterE.constprop.183[_ZNK4mlir6detail31OpOrInterfaceRewritePatternBaseINS_6amdgpu13GatherToLDSOpEE15matchAndRewriteEPNS_9OperationERNS_15PatternRewriterE]+0x2de): undefined reference to `mlir::affine::resolveIndicesIntoOpWithOffsetsAndStrides(mlir::RewriterBase&, mlir::Location, llvm::ArrayRef<mlir::OpFoldResult>, llvm::ArrayRef<mlir::OpFoldResult>, llvm::SmallBitVector const&, llvm::ArrayRef<mlir::OpFoldResult>, llvm::SmallVectorImpl<mlir::Value>&)'
tools/mlir/lib/Dialect/AMDGPU/Transforms/CMakeFiles/obj.MLIRAMDGPUTransforms.dir/FoldMemRefsOps.cpp.o: In function `mlir::amdgpu::FoldMemRefOpsIntoGatherToLDSOp::matchAndRewrite(mlir::amdgpu::GatherToLDSOp, mlir::PatternRewriter&) const':
FoldMemRefsOps.cpp:(.text._ZNK4mlir6amdgpu30FoldMemRefOpsIntoGatherToLDSOp15matchAndRewriteENS0_13GatherToLDSOpERNS_15PatternRewriterE[_ZNK4mlir6amdgpu30FoldMemRefOpsIntoGatherToLDSOp15matchAndRewriteENS0_13GatherToLDSOpERNS_15PatternRewriterE]+0x2de): undefined reference to `mlir::affine::resolveIndicesIntoOpWithOffsetsAndStrides(mlir::RewriterBase&, mlir::Location, llvm::ArrayRef<mlir::OpFoldResult>, llvm::ArrayRef<mlir::OpFoldResult>, llvm::SmallBitVector const&, llvm::ArrayRef<mlir::OpFoldResult>, llvm::SmallVectorImpl<mlir::Value>&)'
collect2: error: ld returned 1 exit status
[7337/7955] Building CXX object tools/flang/lib/Optimizer/Builder/CMakeFiles/FIRBuilder.dir/MutableBox.cpp.o
[7338/7955] Linking CXX shared library lib/libMLIRGPUToGPURuntimeTransforms.so.22.0git
[7339/7955] Linking CXX shared library lib/libMLIRSCFToGPU.so.22.0git
[7340/7955] Linking CXX shared library lib/libMLIRCAPIGPU.so.22.0git
[7341/7955] Building CXX object tools/flang/lib/Optimizer/Analysis/CMakeFiles/FIRAnalysis.dir/TBAAForest.cpp.o
[7342/7955] Building CXX object tools/flang/lib/Optimizer/Builder/CMakeFiles/FIRBuilder.dir/Character.cpp.o
[7343/7955] Building CXX object tools/flang/unittests/Evaluate/CMakeFiles/intrinsics.test.dir/intrinsics.cpp.o
[7344/7955] Building CXX object tools/flang/lib/Optimizer/Builder/CMakeFiles/FIRBuilder.dir/Runtime/Character.cpp.o
[7345/7955] Building CXX object tools/flang/lib/Optimizer/Builder/CMakeFiles/FIRBuilder.dir/Runtime/Command.cpp.o
[7346/7955] Linking CXX shared library lib/libclang-cpp.so.22.0git
[7347/7955] Building CXX object tools/flang/lib/Evaluate/CMakeFiles/FortranEvaluate.dir/cmake_pch.hxx.gch
[7348/7955] Building CXX object tools/flang/lib/Optimizer/Builder/CMakeFiles/FIRBuilder.dir/Runtime/CUDA/Descriptor.cpp.o
[7349/7955] Building CXX object tools/flang/lib/Optimizer/Builder/CMakeFiles/FIRBuilder.dir/Runtime/Derived.cpp.o
[7350/7955] Building CXX object tools/flang/lib/Optimizer/Builder/CMakeFiles/FIRBuilder.dir/Runtime/EnvironmentDefaults.cpp.o
[7351/7955] Building CXX object tools/flang/lib/Optimizer/Analysis/CMakeFiles/FIRAnalysis.dir/AliasAnalysis.cpp.o
[7352/7955] Building CXX object tools/flang/lib/Optimizer/Builder/CMakeFiles/FIRBuilder.dir/HLFIRTools.cpp.o
[7353/7955] Building CXX object tools/flang/lib/Parser/CMakeFiles/FortranParser.dir/cmake_pch.hxx.gch
[7354/7955] Building CXX object tools/flang/lib/Optimizer/Builder/CMakeFiles/FIRBuilder.dir/FIRBuilder.cpp.o
[7355/7955] Building CXX object tools/flang/unittests/Evaluate/CMakeFiles/expression.test.dir/expression.cpp.o
[7356/7955] Building CXX object tools/flang/lib/Optimizer/Builder/CMakeFiles/FIRBuilder.dir/CUFCommon.cpp.o
[7357/7955] Building CXX object tools/flang/unittests/Evaluate/CMakeFiles/folding.test.dir/folding.cpp.o
[7358/7955] Building CXX object tools/flang/lib/FrontendTool/CMakeFiles/flangFrontendTool.dir/ExecuteCompilerInvocation.cpp.o
[7359/7955] Building CXX object tools/flang/lib/Semantics/CMakeFiles/FortranSemantics.dir/cmake_pch.hxx.gch
[7360/7955] Building CXX object tools/mlir/tools/mlir-reduce/CMakeFiles/mlir-reduce.dir/mlir-reduce.cpp.o
[7361/7955] Building CXX object tools/mlir/lib/CAPI/RegisterEverything/CMakeFiles/obj.MLIRCAPIRegisterEverything.dir/RegisterEverything.cpp.o
[7362/7955] Building CXX object tools/mlir/tools/mlir-opt/CMakeFiles/MLIRMlirOptMain.dir/mlir-opt.cpp.o
[7363/7955] Building CXX object tools/mlir/tools/mlir-opt/CMakeFiles/mlir-opt.dir/mlir-opt.cpp.o
[7364/7955] Building CXX object tools/mlir/examples/transform-opt/CMakeFiles/mlir-transform-opt.dir/mlir-transform-opt.cpp.o
[7365/7955] Building CXX object tools/flang/tools/f18-parse-demo/CMakeFiles/f18-parse-demo.dir/f18-parse-demo.cpp.o
[7366/7955] Building CXX object tools/flang/lib/Optimizer/Builder/CMakeFiles/FIRBuilder.dir/PPCIntrinsicCall.cpp.o
[7367/7955] Building CXX object tools/flang/lib/Optimizer/Builder/CMakeFiles/FIRBuilder.dir/IntrinsicCall.cpp.o
[7368/7955] Building CXX object tools/flang/lib/Frontend/CMakeFiles/flangFrontend.dir/cmake_pch.hxx.gch

@kuhar
Copy link
Member

kuhar commented Jul 23, 2025

@lialan can you look at the link errors?

@lialan
Copy link
Member Author

lialan commented Jul 23, 2025

@lialan can you look at the link errors?

definitely missing including the MemRefUtils in some backend, I will take a look.

@llvm-ci
Copy link
Collaborator

llvm-ci commented Jul 23, 2025

LLVM Buildbot has detected a new failure on builder amdgpu-offload-ubuntu-22-cmake-build-only running on rocm-docker-ubu-22 while building mlir at step 4 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/203/builds/17673

Here is the relevant piece of the build log for the reference
Step 4 (annotate) failure: '../llvm-zorg/zorg/buildbot/builders/annotated/amdgpu-offload-cmake.py --jobs=32' (failure)
...
[7256/7955] Creating library symlink lib/libMLIRAMDGPUDialect.so
[7257/7955] Linking CXX shared library lib/libMLIRROCDLTarget.so.22.0git
[7258/7955] Creating library symlink lib/libMLIRROCDLTarget.so
[7259/7955] Linking CXX shared library lib/libMLIRAMDGPUUtils.so.22.0git
[7260/7955] Creating library symlink lib/libMLIRAMDGPUUtils.so
[7261/7955] Linking CXX shared library lib/libMLIRAMDGPUToROCDL.so.22.0git
[7262/7955] Creating library symlink lib/libMLIRAMDGPUToROCDL.so
[7263/7955] Linking CXX shared library lib/libMLIRGPUTransforms.so.22.0git
[7264/7955] Linking CXX shared library lib/libMLIRArithToAMDGPU.so.22.0git
[7265/7955] Linking CXX shared library lib/libMLIRAMDGPUTransforms.so.22.0git
FAILED: lib/libMLIRAMDGPUTransforms.so.22.0git 
: && /usr/bin/c++ -fPIC -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wno-uninitialized -Wno-nonnull -Wno-class-memaccess -Wno-redundant-move -Wno-pessimizing-move -Wno-noexcept-type -Wdelete-non-virtual-dtor -Wsuggest-override -Wno-comment -Wno-misleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -Wundef -Wno-unused-but-set-parameter -Wno-deprecated-copy -O3 -DNDEBUG  -Wl,-z,defs -Wl,-z,nodelete   -Wl,-rpath-link,/home/botworker/bbot/amdgpu-offload-ubuntu-22-cmake-build-only/build/./lib  -Wl,--gc-sections -shared -Wl,-soname,libMLIRAMDGPUTransforms.so.22.0git -o lib/libMLIRAMDGPUTransforms.so.22.0git tools/mlir/lib/Dialect/AMDGPU/Transforms/CMakeFiles/obj.MLIRAMDGPUTransforms.dir/EmulateAtomics.cpp.o tools/mlir/lib/Dialect/AMDGPU/Transforms/CMakeFiles/obj.MLIRAMDGPUTransforms.dir/FoldMemRefsOps.cpp.o tools/mlir/lib/Dialect/AMDGPU/Transforms/CMakeFiles/obj.MLIRAMDGPUTransforms.dir/MaskedloadToLoad.cpp.o tools/mlir/lib/Dialect/AMDGPU/Transforms/CMakeFiles/obj.MLIRAMDGPUTransforms.dir/ResolveStridedMetadata.cpp.o  -Wl,-rpath,"\$ORIGIN/../lib:/home/botworker/bbot/amdgpu-offload-ubuntu-22-cmake-build-only/build/lib:"  lib/libMLIRAMDGPUUtils.so.22.0git  lib/libMLIRSCFDialect.so.22.0git  lib/libMLIRVectorDialect.so.22.0git  lib/libMLIRControlFlowDialect.so.22.0git  lib/libMLIRFuncDialect.so.22.0git  lib/libMLIRTransforms.so.22.0git  lib/libMLIRTransformUtils.so.22.0git  lib/libMLIRAMDGPUDialect.so.22.0git  lib/libMLIRROCDLDialect.so.22.0git  lib/libMLIRLLVMDialect.so.22.0git  lib/libLLVMBitWriter.so.22.0git  lib/libLLVMBitReader.so.22.0git  lib/libLLVMAsmParser.so.22.0git  lib/libLLVMCore.so.22.0git  lib/libLLVMBinaryFormat.so.22.0git  lib/libMLIRGPUDialect.so.22.0git  lib/libMLIRDLTIDialect.so.22.0git  lib/libMLIRMathDialect.so.22.0git  lib/libMLIRMemRefUtils.so.22.0git  lib/libMLIRTensorDialect.so.22.0git  lib/libMLIRParallelCombiningOpInterface.so.22.0git  lib/libMLIRAffineDialect.so.22.0git  lib/libMLIRMemRefDialect.so.22.0git  lib/libMLIRArithUtils.so.22.0git  lib/libMLIRComplexDialect.so.22.0git  lib/libMLIRArithDialect.so.22.0git  lib/libMLIRCastInterfaces.so.22.0git  lib/libMLIRInferIntRangeCommon.so.22.0git  lib/libMLIRDialect.so.22.0git  lib/libMLIRDialectUtils.so.22.0git  lib/libMLIRShapedOpInterfaces.so.22.0git  lib/libMLIRIndexingMapOpInterface.so.22.0git  lib/libMLIRMaskableOpInterface.so.22.0git  lib/libMLIRMaskingOpInterface.so.22.0git  lib/libMLIRVectorInterfaces.so.22.0git  lib/libMLIRSubsetOpInterface.so.22.0git  lib/libMLIRValueBoundsOpInterface.so.22.0git  lib/libMLIRDestinationStyleOpInterface.so.22.0git  lib/libMLIRRewrite.so.22.0git  lib/libMLIRRewritePDL.so.22.0git  lib/libMLIRPDLToPDLInterp.so.22.0git  lib/libMLIRPass.so.22.0git  lib/libMLIRPDLInterpDialect.so.22.0git  lib/libMLIRPDLDialect.so.22.0git  lib/libMLIRUBDialect.so.22.0git  lib/libMLIRMemorySlotInterfaces.so.22.0git  lib/libMLIRAnalysis.so.22.0git  lib/libMLIRSideEffectInterfaces.so.22.0git  lib/libMLIRInferIntRangeInterface.so.22.0git  lib/libMLIRInferTypeOpInterface.so.22.0git  lib/libMLIRControlFlowInterfaces.so.22.0git  lib/libMLIRViewLikeInterface.so.22.0git  lib/libMLIRLoopLikeInterface.so.22.0git  lib/libMLIRFunctionInterfaces.so.22.0git  lib/libMLIRDataLayoutInterfaces.so.22.0git  lib/libMLIRCallInterfaces.so.22.0git  lib/libMLIRPresburger.so.22.0git  lib/libMLIRRuntimeVerifiableOpInterface.so.22.0git  lib/libMLIRIR.so.22.0git  lib/libMLIRSupport.so.22.0git  lib/libLLVMSupport.so.22.0git  -Wl,-rpath-link,/home/botworker/bbot/amdgpu-offload-ubuntu-22-cmake-build-only/build/lib && :
/usr/bin/ld: tools/mlir/lib/Dialect/AMDGPU/Transforms/CMakeFiles/obj.MLIRAMDGPUTransforms.dir/FoldMemRefsOps.cpp.o: in function `mlir::amdgpu::FoldMemRefOpsIntoGatherToLDSOp::matchAndRewrite(mlir::amdgpu::GatherToLDSOp, mlir::PatternRewriter&) const':
FoldMemRefsOps.cpp:(.text._ZNK4mlir6amdgpu30FoldMemRefOpsIntoGatherToLDSOp15matchAndRewriteENS0_13GatherToLDSOpERNS_15PatternRewriterE[_ZNK4mlir6amdgpu30FoldMemRefOpsIntoGatherToLDSOp15matchAndRewriteENS0_13GatherToLDSOpERNS_15PatternRewriterE]+0x2f2): undefined reference to `mlir::affine::resolveIndicesIntoOpWithOffsetsAndStrides(mlir::RewriterBase&, mlir::Location, llvm::ArrayRef<mlir::OpFoldResult>, llvm::ArrayRef<mlir::OpFoldResult>, llvm::SmallBitVector const&, llvm::ArrayRef<mlir::OpFoldResult>, llvm::SmallVectorImpl<mlir::Value>&)'
collect2: error: ld returned 1 exit status
[7266/7955] Creating library symlink lib/libMLIRArithToAMDGPU.so
[7267/7955] Building CXX object tools/flang/lib/Optimizer/Support/CMakeFiles/FIRSupport.dir/Utils.cpp.o
[7268/7955] Building CXX object tools/flang/lib/Support/CMakeFiles/FortranSupport.dir/OpenMP-utils.cpp.o
[7269/7955] Building CXX object tools/flang/lib/Optimizer/HLFIR/Transforms/CMakeFiles/HLFIRTransforms.dir/PropagateFortranVariableAttributes.cpp.o
[7270/7955] Building CXX object tools/flang/lib/Optimizer/Transforms/CMakeFiles/FIRTransforms.dir/MemoryUtils.cpp.o
[7271/7955] Building CXX object tools/flang/lib/Optimizer/Transforms/CMakeFiles/FIRTransforms.dir/AssumedRankOpConversion.cpp.o
[7272/7955] Building CXX object tools/flang/lib/Optimizer/OpenMP/CMakeFiles/FlangOpenMPTransforms.dir/LowerNontemporal.cpp.o
[7273/7955] Building CXX object tools/flang/lib/Optimizer/Transforms/CMakeFiles/FIRTransforms.dir/AnnotateConstant.cpp.o
[7274/7955] Building CXX object tools/flang/lib/Optimizer/Transforms/CMakeFiles/FIRTransforms.dir/MemoryAllocation.cpp.o
[7275/7955] Building CXX object tools/flang/lib/Optimizer/Transforms/CMakeFiles/FIRTransforms.dir/FIRToSCF.cpp.o
[7276/7955] Building CXX object tools/flang/lib/Evaluate/CMakeFiles/FortranEvaluate.dir/cmake_pch.hxx.gch
[7277/7955] Building CXX object tools/flang/lib/Optimizer/Transforms/CMakeFiles/FIRTransforms.dir/AffineDemotion.cpp.o
[7278/7955] Building CXX object tools/flang/lib/Optimizer/Transforms/CMakeFiles/FIRTransforms.dir/ControlFlowConverter.cpp.o
[7279/7955] Building CXX object tools/flang/lib/Optimizer/Transforms/CMakeFiles/FIRTransforms.dir/AffinePromotion.cpp.o
[7280/7955] Building CXX object tools/flang/lib/Optimizer/OpenMP/CMakeFiles/FlangOpenMPTransforms.dir/LowerWorkshare.cpp.o
[7281/7955] Building CXX object tools/flang/unittests/Evaluate/CMakeFiles/intrinsics.test.dir/intrinsics.cpp.o
[7282/7955] Building CXX object tools/flang/lib/Optimizer/Transforms/CMakeFiles/FIRTransforms.dir/AbstractResult.cpp.o
[7283/7955] Building CXX object tools/flang/lib/Optimizer/Transforms/CMakeFiles/FIRTransforms.dir/CUFOpConversion.cpp.o
[7284/7955] Building CXX object tools/flang/lib/Parser/CMakeFiles/FortranParser.dir/cmake_pch.hxx.gch
[7285/7955] Building CXX object tools/flang/lib/Optimizer/Builder/CMakeFiles/FIRBuilder.dir/FIRBuilder.cpp.o
[7286/7955] Building CXX object tools/flang/lib/Optimizer/Transforms/CMakeFiles/FIRTransforms.dir/CUFGPUToLLVMConversion.cpp.o
[7287/7955] Building CXX object tools/flang/unittests/Evaluate/CMakeFiles/folding.test.dir/folding.cpp.o
[7288/7955] Building CXX object tools/flang/unittests/Evaluate/CMakeFiles/expression.test.dir/expression.cpp.o
[7289/7955] Building CXX object tools/flang/lib/Optimizer/Dialect/CMakeFiles/FIRDialect.dir/FIRDialect.cpp.o
[7290/7955] Building CXX object tools/flang/tools/flang-driver/CMakeFiles/flang.dir/fc1_main.cpp.o
[7291/7955] Building CXX object tools/flang/lib/Semantics/CMakeFiles/FortranSemantics.dir/cmake_pch.hxx.gch
[7292/7955] Building CXX object tools/flang/tools/fir-lsp-server/CMakeFiles/fir-lsp-server.dir/fir-lsp-server.cpp.o
[7293/7955] Building CXX object tools/flang/tools/tco/CMakeFiles/tco.dir/tco.cpp.o
[7294/7955] Building CXX object tools/mlir/tools/mlir-opt/CMakeFiles/mlir-opt.dir/mlir-opt.cpp.o
[7295/7955] Building CXX object tools/mlir/lib/CAPI/RegisterEverything/CMakeFiles/obj.MLIRCAPIRegisterEverything.dir/RegisterEverything.cpp.o
[7296/7955] Building CXX object tools/mlir/tools/mlir-opt/CMakeFiles/MLIRMlirOptMain.dir/mlir-opt.cpp.o
[7297/7955] Building CXX object tools/flang/lib/Optimizer/Builder/CMakeFiles/FIRBuilder.dir/PPCIntrinsicCall.cpp.o
[7298/7955] Building CXX object tools/flang/tools/f18-parse-demo/CMakeFiles/f18-parse-demo.dir/f18-parse-demo.cpp.o
ninja: build stopped: subcommand failed.
Step 7 (build cmake config) failure: build cmake config (failure)
...
[7256/7955] Creating library symlink lib/libMLIRAMDGPUDialect.so
[7257/7955] Linking CXX shared library lib/libMLIRROCDLTarget.so.22.0git
[7258/7955] Creating library symlink lib/libMLIRROCDLTarget.so
[7259/7955] Linking CXX shared library lib/libMLIRAMDGPUUtils.so.22.0git
[7260/7955] Creating library symlink lib/libMLIRAMDGPUUtils.so
[7261/7955] Linking CXX shared library lib/libMLIRAMDGPUToROCDL.so.22.0git
[7262/7955] Creating library symlink lib/libMLIRAMDGPUToROCDL.so
[7263/7955] Linking CXX shared library lib/libMLIRGPUTransforms.so.22.0git
[7264/7955] Linking CXX shared library lib/libMLIRArithToAMDGPU.so.22.0git
[7265/7955] Linking CXX shared library lib/libMLIRAMDGPUTransforms.so.22.0git
FAILED: lib/libMLIRAMDGPUTransforms.so.22.0git 
: && /usr/bin/c++ -fPIC -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wno-uninitialized -Wno-nonnull -Wno-class-memaccess -Wno-redundant-move -Wno-pessimizing-move -Wno-noexcept-type -Wdelete-non-virtual-dtor -Wsuggest-override -Wno-comment -Wno-misleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -Wundef -Wno-unused-but-set-parameter -Wno-deprecated-copy -O3 -DNDEBUG  -Wl,-z,defs -Wl,-z,nodelete   -Wl,-rpath-link,/home/botworker/bbot/amdgpu-offload-ubuntu-22-cmake-build-only/build/./lib  -Wl,--gc-sections -shared -Wl,-soname,libMLIRAMDGPUTransforms.so.22.0git -o lib/libMLIRAMDGPUTransforms.so.22.0git tools/mlir/lib/Dialect/AMDGPU/Transforms/CMakeFiles/obj.MLIRAMDGPUTransforms.dir/EmulateAtomics.cpp.o tools/mlir/lib/Dialect/AMDGPU/Transforms/CMakeFiles/obj.MLIRAMDGPUTransforms.dir/FoldMemRefsOps.cpp.o tools/mlir/lib/Dialect/AMDGPU/Transforms/CMakeFiles/obj.MLIRAMDGPUTransforms.dir/MaskedloadToLoad.cpp.o tools/mlir/lib/Dialect/AMDGPU/Transforms/CMakeFiles/obj.MLIRAMDGPUTransforms.dir/ResolveStridedMetadata.cpp.o  -Wl,-rpath,"\$ORIGIN/../lib:/home/botworker/bbot/amdgpu-offload-ubuntu-22-cmake-build-only/build/lib:"  lib/libMLIRAMDGPUUtils.so.22.0git  lib/libMLIRSCFDialect.so.22.0git  lib/libMLIRVectorDialect.so.22.0git  lib/libMLIRControlFlowDialect.so.22.0git  lib/libMLIRFuncDialect.so.22.0git  lib/libMLIRTransforms.so.22.0git  lib/libMLIRTransformUtils.so.22.0git  lib/libMLIRAMDGPUDialect.so.22.0git  lib/libMLIRROCDLDialect.so.22.0git  lib/libMLIRLLVMDialect.so.22.0git  lib/libLLVMBitWriter.so.22.0git  lib/libLLVMBitReader.so.22.0git  lib/libLLVMAsmParser.so.22.0git  lib/libLLVMCore.so.22.0git  lib/libLLVMBinaryFormat.so.22.0git  lib/libMLIRGPUDialect.so.22.0git  lib/libMLIRDLTIDialect.so.22.0git  lib/libMLIRMathDialect.so.22.0git  lib/libMLIRMemRefUtils.so.22.0git  lib/libMLIRTensorDialect.so.22.0git  lib/libMLIRParallelCombiningOpInterface.so.22.0git  lib/libMLIRAffineDialect.so.22.0git  lib/libMLIRMemRefDialect.so.22.0git  lib/libMLIRArithUtils.so.22.0git  lib/libMLIRComplexDialect.so.22.0git  lib/libMLIRArithDialect.so.22.0git  lib/libMLIRCastInterfaces.so.22.0git  lib/libMLIRInferIntRangeCommon.so.22.0git  lib/libMLIRDialect.so.22.0git  lib/libMLIRDialectUtils.so.22.0git  lib/libMLIRShapedOpInterfaces.so.22.0git  lib/libMLIRIndexingMapOpInterface.so.22.0git  lib/libMLIRMaskableOpInterface.so.22.0git  lib/libMLIRMaskingOpInterface.so.22.0git  lib/libMLIRVectorInterfaces.so.22.0git  lib/libMLIRSubsetOpInterface.so.22.0git  lib/libMLIRValueBoundsOpInterface.so.22.0git  lib/libMLIRDestinationStyleOpInterface.so.22.0git  lib/libMLIRRewrite.so.22.0git  lib/libMLIRRewritePDL.so.22.0git  lib/libMLIRPDLToPDLInterp.so.22.0git  lib/libMLIRPass.so.22.0git  lib/libMLIRPDLInterpDialect.so.22.0git  lib/libMLIRPDLDialect.so.22.0git  lib/libMLIRUBDialect.so.22.0git  lib/libMLIRMemorySlotInterfaces.so.22.0git  lib/libMLIRAnalysis.so.22.0git  lib/libMLIRSideEffectInterfaces.so.22.0git  lib/libMLIRInferIntRangeInterface.so.22.0git  lib/libMLIRInferTypeOpInterface.so.22.0git  lib/libMLIRControlFlowInterfaces.so.22.0git  lib/libMLIRViewLikeInterface.so.22.0git  lib/libMLIRLoopLikeInterface.so.22.0git  lib/libMLIRFunctionInterfaces.so.22.0git  lib/libMLIRDataLayoutInterfaces.so.22.0git  lib/libMLIRCallInterfaces.so.22.0git  lib/libMLIRPresburger.so.22.0git  lib/libMLIRRuntimeVerifiableOpInterface.so.22.0git  lib/libMLIRIR.so.22.0git  lib/libMLIRSupport.so.22.0git  lib/libLLVMSupport.so.22.0git  -Wl,-rpath-link,/home/botworker/bbot/amdgpu-offload-ubuntu-22-cmake-build-only/build/lib && :
/usr/bin/ld: tools/mlir/lib/Dialect/AMDGPU/Transforms/CMakeFiles/obj.MLIRAMDGPUTransforms.dir/FoldMemRefsOps.cpp.o: in function `mlir::amdgpu::FoldMemRefOpsIntoGatherToLDSOp::matchAndRewrite(mlir::amdgpu::GatherToLDSOp, mlir::PatternRewriter&) const':
FoldMemRefsOps.cpp:(.text._ZNK4mlir6amdgpu30FoldMemRefOpsIntoGatherToLDSOp15matchAndRewriteENS0_13GatherToLDSOpERNS_15PatternRewriterE[_ZNK4mlir6amdgpu30FoldMemRefOpsIntoGatherToLDSOp15matchAndRewriteENS0_13GatherToLDSOpERNS_15PatternRewriterE]+0x2f2): undefined reference to `mlir::affine::resolveIndicesIntoOpWithOffsetsAndStrides(mlir::RewriterBase&, mlir::Location, llvm::ArrayRef<mlir::OpFoldResult>, llvm::ArrayRef<mlir::OpFoldResult>, llvm::SmallBitVector const&, llvm::ArrayRef<mlir::OpFoldResult>, llvm::SmallVectorImpl<mlir::Value>&)'
collect2: error: ld returned 1 exit status
[7266/7955] Creating library symlink lib/libMLIRArithToAMDGPU.so
[7267/7955] Building CXX object tools/flang/lib/Optimizer/Support/CMakeFiles/FIRSupport.dir/Utils.cpp.o
[7268/7955] Building CXX object tools/flang/lib/Support/CMakeFiles/FortranSupport.dir/OpenMP-utils.cpp.o
[7269/7955] Building CXX object tools/flang/lib/Optimizer/HLFIR/Transforms/CMakeFiles/HLFIRTransforms.dir/PropagateFortranVariableAttributes.cpp.o
[7270/7955] Building CXX object tools/flang/lib/Optimizer/Transforms/CMakeFiles/FIRTransforms.dir/MemoryUtils.cpp.o
[7271/7955] Building CXX object tools/flang/lib/Optimizer/Transforms/CMakeFiles/FIRTransforms.dir/AssumedRankOpConversion.cpp.o
[7272/7955] Building CXX object tools/flang/lib/Optimizer/OpenMP/CMakeFiles/FlangOpenMPTransforms.dir/LowerNontemporal.cpp.o
[7273/7955] Building CXX object tools/flang/lib/Optimizer/Transforms/CMakeFiles/FIRTransforms.dir/AnnotateConstant.cpp.o
[7274/7955] Building CXX object tools/flang/lib/Optimizer/Transforms/CMakeFiles/FIRTransforms.dir/MemoryAllocation.cpp.o
[7275/7955] Building CXX object tools/flang/lib/Optimizer/Transforms/CMakeFiles/FIRTransforms.dir/FIRToSCF.cpp.o
[7276/7955] Building CXX object tools/flang/lib/Evaluate/CMakeFiles/FortranEvaluate.dir/cmake_pch.hxx.gch
[7277/7955] Building CXX object tools/flang/lib/Optimizer/Transforms/CMakeFiles/FIRTransforms.dir/AffineDemotion.cpp.o
[7278/7955] Building CXX object tools/flang/lib/Optimizer/Transforms/CMakeFiles/FIRTransforms.dir/ControlFlowConverter.cpp.o
[7279/7955] Building CXX object tools/flang/lib/Optimizer/Transforms/CMakeFiles/FIRTransforms.dir/AffinePromotion.cpp.o
[7280/7955] Building CXX object tools/flang/lib/Optimizer/OpenMP/CMakeFiles/FlangOpenMPTransforms.dir/LowerWorkshare.cpp.o
[7281/7955] Building CXX object tools/flang/unittests/Evaluate/CMakeFiles/intrinsics.test.dir/intrinsics.cpp.o
[7282/7955] Building CXX object tools/flang/lib/Optimizer/Transforms/CMakeFiles/FIRTransforms.dir/AbstractResult.cpp.o
[7283/7955] Building CXX object tools/flang/lib/Optimizer/Transforms/CMakeFiles/FIRTransforms.dir/CUFOpConversion.cpp.o
[7284/7955] Building CXX object tools/flang/lib/Parser/CMakeFiles/FortranParser.dir/cmake_pch.hxx.gch
[7285/7955] Building CXX object tools/flang/lib/Optimizer/Builder/CMakeFiles/FIRBuilder.dir/FIRBuilder.cpp.o
[7286/7955] Building CXX object tools/flang/lib/Optimizer/Transforms/CMakeFiles/FIRTransforms.dir/CUFGPUToLLVMConversion.cpp.o
[7287/7955] Building CXX object tools/flang/unittests/Evaluate/CMakeFiles/folding.test.dir/folding.cpp.o
[7288/7955] Building CXX object tools/flang/unittests/Evaluate/CMakeFiles/expression.test.dir/expression.cpp.o
[7289/7955] Building CXX object tools/flang/lib/Optimizer/Dialect/CMakeFiles/FIRDialect.dir/FIRDialect.cpp.o
[7290/7955] Building CXX object tools/flang/tools/flang-driver/CMakeFiles/flang.dir/fc1_main.cpp.o
[7291/7955] Building CXX object tools/flang/lib/Semantics/CMakeFiles/FortranSemantics.dir/cmake_pch.hxx.gch
[7292/7955] Building CXX object tools/flang/tools/fir-lsp-server/CMakeFiles/fir-lsp-server.dir/fir-lsp-server.cpp.o
[7293/7955] Building CXX object tools/flang/tools/tco/CMakeFiles/tco.dir/tco.cpp.o
[7294/7955] Building CXX object tools/mlir/tools/mlir-opt/CMakeFiles/mlir-opt.dir/mlir-opt.cpp.o
[7295/7955] Building CXX object tools/mlir/lib/CAPI/RegisterEverything/CMakeFiles/obj.MLIRCAPIRegisterEverything.dir/RegisterEverything.cpp.o
[7296/7955] Building CXX object tools/mlir/tools/mlir-opt/CMakeFiles/MLIRMlirOptMain.dir/mlir-opt.cpp.o
[7297/7955] Building CXX object tools/flang/lib/Optimizer/Builder/CMakeFiles/FIRBuilder.dir/PPCIntrinsicCall.cpp.o
[7298/7955] Building CXX object tools/flang/tools/f18-parse-demo/CMakeFiles/f18-parse-demo.dir/f18-parse-demo.cpp.o
ninja: build stopped: subcommand failed.

@llvm-ci
Copy link
Collaborator

llvm-ci commented Jul 23, 2025

LLVM Buildbot has detected a new failure on builder amdgpu-offload-rhel-9-cmake-build-only running on rocm-docker-rhel-9 while building mlir at step 4 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/205/builds/16463

Here is the relevant piece of the build log for the reference
Step 4 (annotate) failure: '../llvm-zorg/zorg/buildbot/builders/annotated/amdgpu-offload-cmake.py --jobs=32' (failure)
...
[7333/7955] Creating library symlink lib/libMLIRAMDGPUDialect.so
[7334/7955] Linking CXX shared library lib/libMLIRAMDGPUUtils.so.22.0git
[7335/7955] Creating library symlink lib/libMLIRAMDGPUUtils.so
[7336/7955] Building CXX object tools/clang/tools/clang-check/CMakeFiles/clang-check.dir/ClangCheck.cpp.o
[7337/7955] Linking CXX shared library lib/libMLIRAMDGPUToROCDL.so.22.0git
[7338/7955] Creating library symlink lib/libMLIRAMDGPUToROCDL.so
[7339/7955] Building CXX object tools/clang/tools/libclang/CMakeFiles/libclang.dir/CIndex.cpp.o
[7340/7955] Linking CXX shared library lib/libMLIRArithToAMDGPU.so.22.0git
[7341/7955] Creating library symlink lib/libMLIRArithToAMDGPU.so
[7342/7955] Linking CXX shared library lib/libMLIRAMDGPUTransforms.so.22.0git
FAILED: lib/libMLIRAMDGPUTransforms.so.22.0git 
: && /usr/bin/c++ -fPIC -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wno-uninitialized -Wno-nonnull -Wno-class-memaccess -Wno-redundant-move -Wno-pessimizing-move -Wno-noexcept-type -Wdelete-non-virtual-dtor -Wsuggest-override -Wno-comment -Wno-misleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -Wundef -Wno-unused-but-set-parameter -Wno-deprecated-copy -O3 -DNDEBUG  -Wl,-z,defs -Wl,-z,nodelete   -Wl,-rpath-link,/home/botworker/bbot/amdgpu-offload-rhel-9-cmake-build-only/build/./lib  -Wl,--gc-sections -shared -Wl,-soname,libMLIRAMDGPUTransforms.so.22.0git -o lib/libMLIRAMDGPUTransforms.so.22.0git tools/mlir/lib/Dialect/AMDGPU/Transforms/CMakeFiles/obj.MLIRAMDGPUTransforms.dir/EmulateAtomics.cpp.o tools/mlir/lib/Dialect/AMDGPU/Transforms/CMakeFiles/obj.MLIRAMDGPUTransforms.dir/FoldMemRefsOps.cpp.o tools/mlir/lib/Dialect/AMDGPU/Transforms/CMakeFiles/obj.MLIRAMDGPUTransforms.dir/MaskedloadToLoad.cpp.o tools/mlir/lib/Dialect/AMDGPU/Transforms/CMakeFiles/obj.MLIRAMDGPUTransforms.dir/ResolveStridedMetadata.cpp.o  -Wl,-rpath,"\$ORIGIN/../lib:/home/botworker/bbot/amdgpu-offload-rhel-9-cmake-build-only/build/lib:"  lib/libMLIRAMDGPUUtils.so.22.0git  lib/libMLIRSCFDialect.so.22.0git  lib/libMLIRVectorDialect.so.22.0git  lib/libMLIRControlFlowDialect.so.22.0git  lib/libMLIRFuncDialect.so.22.0git  lib/libMLIRTransforms.so.22.0git  lib/libMLIRTransformUtils.so.22.0git  lib/libMLIRAMDGPUDialect.so.22.0git  lib/libMLIRROCDLDialect.so.22.0git  lib/libMLIRLLVMDialect.so.22.0git  lib/libLLVMBitWriter.so.22.0git  lib/libLLVMBitReader.so.22.0git  lib/libLLVMAsmParser.so.22.0git  lib/libLLVMCore.so.22.0git  lib/libLLVMBinaryFormat.so.22.0git  lib/libMLIRGPUDialect.so.22.0git  lib/libMLIRDLTIDialect.so.22.0git  lib/libMLIRMathDialect.so.22.0git  lib/libMLIRMemRefUtils.so.22.0git  lib/libMLIRTensorDialect.so.22.0git  lib/libMLIRParallelCombiningOpInterface.so.22.0git  lib/libMLIRAffineDialect.so.22.0git  lib/libMLIRMemRefDialect.so.22.0git  lib/libMLIRArithUtils.so.22.0git  lib/libMLIRComplexDialect.so.22.0git  lib/libMLIRArithDialect.so.22.0git  lib/libMLIRCastInterfaces.so.22.0git  lib/libMLIRInferIntRangeCommon.so.22.0git  lib/libMLIRDialect.so.22.0git  lib/libMLIRDialectUtils.so.22.0git  lib/libMLIRShapedOpInterfaces.so.22.0git  lib/libMLIRIndexingMapOpInterface.so.22.0git  lib/libMLIRMaskableOpInterface.so.22.0git  lib/libMLIRMaskingOpInterface.so.22.0git  lib/libMLIRVectorInterfaces.so.22.0git  lib/libMLIRSubsetOpInterface.so.22.0git  lib/libMLIRValueBoundsOpInterface.so.22.0git  lib/libMLIRDestinationStyleOpInterface.so.22.0git  lib/libMLIRRewrite.so.22.0git  lib/libMLIRRewritePDL.so.22.0git  lib/libMLIRPDLToPDLInterp.so.22.0git  lib/libMLIRPass.so.22.0git  lib/libMLIRPDLInterpDialect.so.22.0git  lib/libMLIRPDLDialect.so.22.0git  lib/libMLIRUBDialect.so.22.0git  lib/libMLIRMemorySlotInterfaces.so.22.0git  lib/libMLIRAnalysis.so.22.0git  lib/libMLIRSideEffectInterfaces.so.22.0git  lib/libMLIRInferIntRangeInterface.so.22.0git  lib/libMLIRInferTypeOpInterface.so.22.0git  lib/libMLIRControlFlowInterfaces.so.22.0git  lib/libMLIRViewLikeInterface.so.22.0git  lib/libMLIRLoopLikeInterface.so.22.0git  lib/libMLIRFunctionInterfaces.so.22.0git  lib/libMLIRDataLayoutInterfaces.so.22.0git  lib/libMLIRCallInterfaces.so.22.0git  lib/libMLIRPresburger.so.22.0git  lib/libMLIRRuntimeVerifiableOpInterface.so.22.0git  lib/libMLIRIR.so.22.0git  lib/libMLIRSupport.so.22.0git  lib/libLLVMSupport.so.22.0git  -Wl,-rpath-link,/home/botworker/bbot/amdgpu-offload-rhel-9-cmake-build-only/build/lib && :
/usr/bin/ld: tools/mlir/lib/Dialect/AMDGPU/Transforms/CMakeFiles/obj.MLIRAMDGPUTransforms.dir/FoldMemRefsOps.cpp.o: in function `mlir::amdgpu::FoldMemRefOpsIntoGatherToLDSOp::matchAndRewrite(mlir::amdgpu::GatherToLDSOp, mlir::PatternRewriter&) const':
FoldMemRefsOps.cpp:(.text._ZNK4mlir6amdgpu30FoldMemRefOpsIntoGatherToLDSOp15matchAndRewriteENS0_13GatherToLDSOpERNS_15PatternRewriterE[_ZNK4mlir6amdgpu30FoldMemRefOpsIntoGatherToLDSOp15matchAndRewriteENS0_13GatherToLDSOpERNS_15PatternRewriterE]+0x2da): undefined reference to `mlir::affine::resolveIndicesIntoOpWithOffsetsAndStrides(mlir::RewriterBase&, mlir::Location, llvm::ArrayRef<mlir::OpFoldResult>, llvm::ArrayRef<mlir::OpFoldResult>, llvm::SmallBitVector const&, llvm::ArrayRef<mlir::OpFoldResult>, llvm::SmallVectorImpl<mlir::Value>&)'
collect2: error: ld returned 1 exit status
[7343/7955] Linking CXX shared library lib/libMLIRCAPIAMDGPU.so.22.0git
[7344/7955] Linking CXX shared library lib/libMLIRGPUTransforms.so.22.0git
[7345/7955] Building CXX object tools/flang/lib/Optimizer/OpenMP/CMakeFiles/FlangOpenMPTransforms.dir/LowerNontemporal.cpp.o
[7346/7955] Building CXX object tools/flang/lib/Optimizer/Transforms/CMakeFiles/FIRTransforms.dir/FIRToSCF.cpp.o
[7347/7955] Building CXX object tools/flang/lib/Optimizer/OpenMP/CMakeFiles/FlangOpenMPTransforms.dir/FunctionFiltering.cpp.o
[7348/7955] Building CXX object tools/flang/lib/Optimizer/Transforms/CMakeFiles/FIRTransforms.dir/AssumedRankOpConversion.cpp.o
[7349/7955] Building CXX object tools/flang/lib/Optimizer/Transforms/CMakeFiles/FIRTransforms.dir/AffineDemotion.cpp.o
[7350/7955] Building CXX object tools/flang/lib/Optimizer/Transforms/CMakeFiles/FIRTransforms.dir/CompilerGeneratedNames.cpp.o
[7351/7955] Building CXX object tools/flang/lib/Optimizer/Transforms/CMakeFiles/FIRTransforms.dir/CUFComputeSharedMemoryOffsetsAndSize.cpp.o
[7352/7955] Building CXX object tools/flang/lib/Support/CMakeFiles/FortranSupport.dir/OpenMP-utils.cpp.o
[7353/7955] Building CXX object tools/flang/lib/Optimizer/Transforms/CMakeFiles/FIRTransforms.dir/ExternalNameConversion.cpp.o
[7354/7955] Building CXX object tools/flang/lib/Optimizer/OpenMP/CMakeFiles/FlangOpenMPTransforms.dir/MapInfoFinalization.cpp.o
[7355/7955] Linking CXX shared library lib/libclang-cpp.so.22.0git
[7356/7955] Building CXX object tools/flang/lib/Optimizer/Transforms/CMakeFiles/FIRTransforms.dir/ArrayValueCopy.cpp.o
[7357/7955] Building CXX object tools/flang/lib/Optimizer/Transforms/CMakeFiles/FIRTransforms.dir/CUFDeviceGlobal.cpp.o
[7358/7955] Building CXX object tools/flang/lib/Optimizer/Builder/CMakeFiles/FIRBuilder.dir/CUFCommon.cpp.o
[7359/7955] Building CXX object tools/flang/lib/Optimizer/Transforms/CMakeFiles/FIRTransforms.dir/CUFGPUToLLVMConversion.cpp.o
[7360/7955] Building CXX object tools/flang/lib/Evaluate/CMakeFiles/FortranEvaluate.dir/cmake_pch.hxx.gch
[7361/7955] Building CXX object tools/flang/lib/Parser/CMakeFiles/FortranParser.dir/cmake_pch.hxx.gch
[7362/7955] Building CXX object tools/flang/unittests/Evaluate/CMakeFiles/intrinsics.test.dir/intrinsics.cpp.o
[7363/7955] Building CXX object tools/flang/lib/Optimizer/Dialect/CMakeFiles/FIRDialect.dir/FIRDialect.cpp.o
[7364/7955] Building CXX object tools/flang/unittests/Evaluate/CMakeFiles/folding.test.dir/folding.cpp.o
[7365/7955] Building CXX object tools/flang/unittests/Evaluate/CMakeFiles/expression.test.dir/expression.cpp.o
[7366/7955] Building CXX object tools/flang/tools/fir-lsp-server/CMakeFiles/fir-lsp-server.dir/fir-lsp-server.cpp.o
[7367/7955] Building CXX object tools/flang/tools/tco/CMakeFiles/tco.dir/tco.cpp.o
[7368/7955] Building CXX object tools/mlir/tools/mlir-opt/CMakeFiles/MLIRMlirOptMain.dir/mlir-opt.cpp.o
[7369/7955] Building CXX object tools/mlir/lib/CAPI/RegisterEverything/CMakeFiles/obj.MLIRCAPIRegisterEverything.dir/RegisterEverything.cpp.o
[7370/7955] Building CXX object tools/flang/lib/Semantics/CMakeFiles/FortranSemantics.dir/cmake_pch.hxx.gch
[7371/7955] Building CXX object tools/mlir/tools/mlir-reduce/CMakeFiles/mlir-reduce.dir/mlir-reduce.cpp.o
[7372/7955] Building CXX object tools/mlir/tools/mlir-opt/CMakeFiles/mlir-opt.dir/mlir-opt.cpp.o
[7373/7955] Building CXX object tools/mlir/examples/transform-opt/CMakeFiles/mlir-transform-opt.dir/mlir-transform-opt.cpp.o
[7374/7955] Building CXX object tools/flang/lib/Optimizer/HLFIR/Transforms/CMakeFiles/HLFIRTransforms.dir/SimplifyHLFIRIntrinsics.cpp.o
[7375/7955] Building CXX object tools/flang/tools/f18-parse-demo/CMakeFiles/f18-parse-demo.dir/f18-parse-demo.cpp.o
ninja: build stopped: subcommand failed.
Step 7 (build cmake config) failure: build cmake config (failure)
...
[7333/7955] Creating library symlink lib/libMLIRAMDGPUDialect.so
[7334/7955] Linking CXX shared library lib/libMLIRAMDGPUUtils.so.22.0git
[7335/7955] Creating library symlink lib/libMLIRAMDGPUUtils.so
[7336/7955] Building CXX object tools/clang/tools/clang-check/CMakeFiles/clang-check.dir/ClangCheck.cpp.o
[7337/7955] Linking CXX shared library lib/libMLIRAMDGPUToROCDL.so.22.0git
[7338/7955] Creating library symlink lib/libMLIRAMDGPUToROCDL.so
[7339/7955] Building CXX object tools/clang/tools/libclang/CMakeFiles/libclang.dir/CIndex.cpp.o
[7340/7955] Linking CXX shared library lib/libMLIRArithToAMDGPU.so.22.0git
[7341/7955] Creating library symlink lib/libMLIRArithToAMDGPU.so
[7342/7955] Linking CXX shared library lib/libMLIRAMDGPUTransforms.so.22.0git
FAILED: lib/libMLIRAMDGPUTransforms.so.22.0git 
: && /usr/bin/c++ -fPIC -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wno-uninitialized -Wno-nonnull -Wno-class-memaccess -Wno-redundant-move -Wno-pessimizing-move -Wno-noexcept-type -Wdelete-non-virtual-dtor -Wsuggest-override -Wno-comment -Wno-misleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -Wundef -Wno-unused-but-set-parameter -Wno-deprecated-copy -O3 -DNDEBUG  -Wl,-z,defs -Wl,-z,nodelete   -Wl,-rpath-link,/home/botworker/bbot/amdgpu-offload-rhel-9-cmake-build-only/build/./lib  -Wl,--gc-sections -shared -Wl,-soname,libMLIRAMDGPUTransforms.so.22.0git -o lib/libMLIRAMDGPUTransforms.so.22.0git tools/mlir/lib/Dialect/AMDGPU/Transforms/CMakeFiles/obj.MLIRAMDGPUTransforms.dir/EmulateAtomics.cpp.o tools/mlir/lib/Dialect/AMDGPU/Transforms/CMakeFiles/obj.MLIRAMDGPUTransforms.dir/FoldMemRefsOps.cpp.o tools/mlir/lib/Dialect/AMDGPU/Transforms/CMakeFiles/obj.MLIRAMDGPUTransforms.dir/MaskedloadToLoad.cpp.o tools/mlir/lib/Dialect/AMDGPU/Transforms/CMakeFiles/obj.MLIRAMDGPUTransforms.dir/ResolveStridedMetadata.cpp.o  -Wl,-rpath,"\$ORIGIN/../lib:/home/botworker/bbot/amdgpu-offload-rhel-9-cmake-build-only/build/lib:"  lib/libMLIRAMDGPUUtils.so.22.0git  lib/libMLIRSCFDialect.so.22.0git  lib/libMLIRVectorDialect.so.22.0git  lib/libMLIRControlFlowDialect.so.22.0git  lib/libMLIRFuncDialect.so.22.0git  lib/libMLIRTransforms.so.22.0git  lib/libMLIRTransformUtils.so.22.0git  lib/libMLIRAMDGPUDialect.so.22.0git  lib/libMLIRROCDLDialect.so.22.0git  lib/libMLIRLLVMDialect.so.22.0git  lib/libLLVMBitWriter.so.22.0git  lib/libLLVMBitReader.so.22.0git  lib/libLLVMAsmParser.so.22.0git  lib/libLLVMCore.so.22.0git  lib/libLLVMBinaryFormat.so.22.0git  lib/libMLIRGPUDialect.so.22.0git  lib/libMLIRDLTIDialect.so.22.0git  lib/libMLIRMathDialect.so.22.0git  lib/libMLIRMemRefUtils.so.22.0git  lib/libMLIRTensorDialect.so.22.0git  lib/libMLIRParallelCombiningOpInterface.so.22.0git  lib/libMLIRAffineDialect.so.22.0git  lib/libMLIRMemRefDialect.so.22.0git  lib/libMLIRArithUtils.so.22.0git  lib/libMLIRComplexDialect.so.22.0git  lib/libMLIRArithDialect.so.22.0git  lib/libMLIRCastInterfaces.so.22.0git  lib/libMLIRInferIntRangeCommon.so.22.0git  lib/libMLIRDialect.so.22.0git  lib/libMLIRDialectUtils.so.22.0git  lib/libMLIRShapedOpInterfaces.so.22.0git  lib/libMLIRIndexingMapOpInterface.so.22.0git  lib/libMLIRMaskableOpInterface.so.22.0git  lib/libMLIRMaskingOpInterface.so.22.0git  lib/libMLIRVectorInterfaces.so.22.0git  lib/libMLIRSubsetOpInterface.so.22.0git  lib/libMLIRValueBoundsOpInterface.so.22.0git  lib/libMLIRDestinationStyleOpInterface.so.22.0git  lib/libMLIRRewrite.so.22.0git  lib/libMLIRRewritePDL.so.22.0git  lib/libMLIRPDLToPDLInterp.so.22.0git  lib/libMLIRPass.so.22.0git  lib/libMLIRPDLInterpDialect.so.22.0git  lib/libMLIRPDLDialect.so.22.0git  lib/libMLIRUBDialect.so.22.0git  lib/libMLIRMemorySlotInterfaces.so.22.0git  lib/libMLIRAnalysis.so.22.0git  lib/libMLIRSideEffectInterfaces.so.22.0git  lib/libMLIRInferIntRangeInterface.so.22.0git  lib/libMLIRInferTypeOpInterface.so.22.0git  lib/libMLIRControlFlowInterfaces.so.22.0git  lib/libMLIRViewLikeInterface.so.22.0git  lib/libMLIRLoopLikeInterface.so.22.0git  lib/libMLIRFunctionInterfaces.so.22.0git  lib/libMLIRDataLayoutInterfaces.so.22.0git  lib/libMLIRCallInterfaces.so.22.0git  lib/libMLIRPresburger.so.22.0git  lib/libMLIRRuntimeVerifiableOpInterface.so.22.0git  lib/libMLIRIR.so.22.0git  lib/libMLIRSupport.so.22.0git  lib/libLLVMSupport.so.22.0git  -Wl,-rpath-link,/home/botworker/bbot/amdgpu-offload-rhel-9-cmake-build-only/build/lib && :
/usr/bin/ld: tools/mlir/lib/Dialect/AMDGPU/Transforms/CMakeFiles/obj.MLIRAMDGPUTransforms.dir/FoldMemRefsOps.cpp.o: in function `mlir::amdgpu::FoldMemRefOpsIntoGatherToLDSOp::matchAndRewrite(mlir::amdgpu::GatherToLDSOp, mlir::PatternRewriter&) const':
FoldMemRefsOps.cpp:(.text._ZNK4mlir6amdgpu30FoldMemRefOpsIntoGatherToLDSOp15matchAndRewriteENS0_13GatherToLDSOpERNS_15PatternRewriterE[_ZNK4mlir6amdgpu30FoldMemRefOpsIntoGatherToLDSOp15matchAndRewriteENS0_13GatherToLDSOpERNS_15PatternRewriterE]+0x2da): undefined reference to `mlir::affine::resolveIndicesIntoOpWithOffsetsAndStrides(mlir::RewriterBase&, mlir::Location, llvm::ArrayRef<mlir::OpFoldResult>, llvm::ArrayRef<mlir::OpFoldResult>, llvm::SmallBitVector const&, llvm::ArrayRef<mlir::OpFoldResult>, llvm::SmallVectorImpl<mlir::Value>&)'
collect2: error: ld returned 1 exit status
[7343/7955] Linking CXX shared library lib/libMLIRCAPIAMDGPU.so.22.0git
[7344/7955] Linking CXX shared library lib/libMLIRGPUTransforms.so.22.0git
[7345/7955] Building CXX object tools/flang/lib/Optimizer/OpenMP/CMakeFiles/FlangOpenMPTransforms.dir/LowerNontemporal.cpp.o
[7346/7955] Building CXX object tools/flang/lib/Optimizer/Transforms/CMakeFiles/FIRTransforms.dir/FIRToSCF.cpp.o
[7347/7955] Building CXX object tools/flang/lib/Optimizer/OpenMP/CMakeFiles/FlangOpenMPTransforms.dir/FunctionFiltering.cpp.o
[7348/7955] Building CXX object tools/flang/lib/Optimizer/Transforms/CMakeFiles/FIRTransforms.dir/AssumedRankOpConversion.cpp.o
[7349/7955] Building CXX object tools/flang/lib/Optimizer/Transforms/CMakeFiles/FIRTransforms.dir/AffineDemotion.cpp.o
[7350/7955] Building CXX object tools/flang/lib/Optimizer/Transforms/CMakeFiles/FIRTransforms.dir/CompilerGeneratedNames.cpp.o
[7351/7955] Building CXX object tools/flang/lib/Optimizer/Transforms/CMakeFiles/FIRTransforms.dir/CUFComputeSharedMemoryOffsetsAndSize.cpp.o
[7352/7955] Building CXX object tools/flang/lib/Support/CMakeFiles/FortranSupport.dir/OpenMP-utils.cpp.o
[7353/7955] Building CXX object tools/flang/lib/Optimizer/Transforms/CMakeFiles/FIRTransforms.dir/ExternalNameConversion.cpp.o
[7354/7955] Building CXX object tools/flang/lib/Optimizer/OpenMP/CMakeFiles/FlangOpenMPTransforms.dir/MapInfoFinalization.cpp.o
[7355/7955] Linking CXX shared library lib/libclang-cpp.so.22.0git
[7356/7955] Building CXX object tools/flang/lib/Optimizer/Transforms/CMakeFiles/FIRTransforms.dir/ArrayValueCopy.cpp.o
[7357/7955] Building CXX object tools/flang/lib/Optimizer/Transforms/CMakeFiles/FIRTransforms.dir/CUFDeviceGlobal.cpp.o
[7358/7955] Building CXX object tools/flang/lib/Optimizer/Builder/CMakeFiles/FIRBuilder.dir/CUFCommon.cpp.o
[7359/7955] Building CXX object tools/flang/lib/Optimizer/Transforms/CMakeFiles/FIRTransforms.dir/CUFGPUToLLVMConversion.cpp.o
[7360/7955] Building CXX object tools/flang/lib/Evaluate/CMakeFiles/FortranEvaluate.dir/cmake_pch.hxx.gch
[7361/7955] Building CXX object tools/flang/lib/Parser/CMakeFiles/FortranParser.dir/cmake_pch.hxx.gch
[7362/7955] Building CXX object tools/flang/unittests/Evaluate/CMakeFiles/intrinsics.test.dir/intrinsics.cpp.o
[7363/7955] Building CXX object tools/flang/lib/Optimizer/Dialect/CMakeFiles/FIRDialect.dir/FIRDialect.cpp.o
[7364/7955] Building CXX object tools/flang/unittests/Evaluate/CMakeFiles/folding.test.dir/folding.cpp.o
[7365/7955] Building CXX object tools/flang/unittests/Evaluate/CMakeFiles/expression.test.dir/expression.cpp.o
[7366/7955] Building CXX object tools/flang/tools/fir-lsp-server/CMakeFiles/fir-lsp-server.dir/fir-lsp-server.cpp.o
[7367/7955] Building CXX object tools/flang/tools/tco/CMakeFiles/tco.dir/tco.cpp.o
[7368/7955] Building CXX object tools/mlir/tools/mlir-opt/CMakeFiles/MLIRMlirOptMain.dir/mlir-opt.cpp.o
[7369/7955] Building CXX object tools/mlir/lib/CAPI/RegisterEverything/CMakeFiles/obj.MLIRCAPIRegisterEverything.dir/RegisterEverything.cpp.o
[7370/7955] Building CXX object tools/flang/lib/Semantics/CMakeFiles/FortranSemantics.dir/cmake_pch.hxx.gch
[7371/7955] Building CXX object tools/mlir/tools/mlir-reduce/CMakeFiles/mlir-reduce.dir/mlir-reduce.cpp.o
[7372/7955] Building CXX object tools/mlir/tools/mlir-opt/CMakeFiles/mlir-opt.dir/mlir-opt.cpp.o
[7373/7955] Building CXX object tools/mlir/examples/transform-opt/CMakeFiles/mlir-transform-opt.dir/mlir-transform-opt.cpp.o
[7374/7955] Building CXX object tools/flang/lib/Optimizer/HLFIR/Transforms/CMakeFiles/HLFIRTransforms.dir/SimplifyHLFIRIntrinsics.cpp.o
[7375/7955] Building CXX object tools/flang/tools/f18-parse-demo/CMakeFiles/f18-parse-demo.dir/f18-parse-demo.cpp.o
ninja: build stopped: subcommand failed.

@llvm-ci
Copy link
Collaborator

llvm-ci commented Jul 23, 2025

LLVM Buildbot has detected a new failure on builder flang-aarch64-latest-gcc running on linaro-flang-aarch64-latest-gcc while building mlir at step 5 "build-unified-tree".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/130/builds/14426

Here is the relevant piece of the build log for the reference
Step 5 (build-unified-tree) failure: build (failure)
...
  198 |   const TemplateArgument &FirstArg = TST->template_arguments()[0];
      |                           ^~~~~~~~
../llvm-project/clang/lib/Sema/HeuristicResolver.cpp:198:65: note: the temporary was destroyed at the end of the full expression ‘TST->clang::TemplateSpecializationType::template_arguments().llvm::ArrayRef<clang::TemplateArgument>::operator[](0)’
  198 |   const TemplateArgument &FirstArg = TST->template_arguments()[0];
      |                                                                 ^
290.730 [1815/28/5727] Building CXX object tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/JumpDiagnostics.cpp.o
290.740 [1815/27/5728] Building CXX object tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/ScopeInfo.cpp.o
290.752 [1815/26/5729] Building CXX object tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/Scope.cpp.o
290.756 [1815/25/5730] Building CXX object tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaAccess.cpp.o
290.774 [1815/24/5731] Linking CXX shared library lib/libMLIRAMDGPUTransforms.so.22.0git
FAILED: lib/libMLIRAMDGPUTransforms.so.22.0git 
: && /usr/local/bin/c++ -fPIC -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wno-uninitialized -Wno-nonnull -Wno-class-memaccess -Wno-redundant-move -Wno-pessimizing-move -Wno-noexcept-type -Wdelete-non-virtual-dtor -Wsuggest-override -Wno-comment -Wno-misleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -Wundef -Wno-unused-but-set-parameter -O3 -DNDEBUG  -Wl,-z,defs -Wl,-z,nodelete   -Wl,-rpath-link,/home/tcwg-buildbot/worker/flang-aarch64-latest-gcc/build/./lib  -Wl,--gc-sections -shared -Wl,-soname,libMLIRAMDGPUTransforms.so.22.0git -o lib/libMLIRAMDGPUTransforms.so.22.0git tools/mlir/lib/Dialect/AMDGPU/Transforms/CMakeFiles/obj.MLIRAMDGPUTransforms.dir/EmulateAtomics.cpp.o tools/mlir/lib/Dialect/AMDGPU/Transforms/CMakeFiles/obj.MLIRAMDGPUTransforms.dir/FoldMemRefsOps.cpp.o tools/mlir/lib/Dialect/AMDGPU/Transforms/CMakeFiles/obj.MLIRAMDGPUTransforms.dir/MaskedloadToLoad.cpp.o tools/mlir/lib/Dialect/AMDGPU/Transforms/CMakeFiles/obj.MLIRAMDGPUTransforms.dir/ResolveStridedMetadata.cpp.o  -Wl,-rpath,"\$ORIGIN/../lib:/home/tcwg-buildbot/worker/flang-aarch64-latest-gcc/build/lib:"  lib/libMLIRAMDGPUUtils.so.22.0git  lib/libMLIRSCFDialect.so.22.0git  lib/libMLIRVectorDialect.so.22.0git  lib/libMLIRControlFlowDialect.so.22.0git  lib/libMLIRFuncDialect.so.22.0git  lib/libMLIRTransforms.so.22.0git  lib/libMLIRTransformUtils.so.22.0git  lib/libMLIRAMDGPUDialect.so.22.0git  lib/libMLIRROCDLDialect.so.22.0git  lib/libMLIRLLVMDialect.so.22.0git  lib/libLLVMBitWriter.so.22.0git  lib/libLLVMBitReader.so.22.0git  lib/libLLVMAsmParser.so.22.0git  lib/libLLVMCore.so.22.0git  lib/libLLVMBinaryFormat.so.22.0git  lib/libMLIRGPUDialect.so.22.0git  lib/libMLIRDLTIDialect.so.22.0git  lib/libMLIRMathDialect.so.22.0git  lib/libMLIRMemRefUtils.so.22.0git  lib/libMLIRTensorDialect.so.22.0git  lib/libMLIRParallelCombiningOpInterface.so.22.0git  lib/libMLIRAffineDialect.so.22.0git  lib/libMLIRMemRefDialect.so.22.0git  lib/libMLIRArithUtils.so.22.0git  lib/libMLIRComplexDialect.so.22.0git  lib/libMLIRArithDialect.so.22.0git  lib/libMLIRCastInterfaces.so.22.0git  lib/libMLIRInferIntRangeCommon.so.22.0git  lib/libMLIRDialect.so.22.0git  lib/libMLIRDialectUtils.so.22.0git  lib/libMLIRShapedOpInterfaces.so.22.0git  lib/libMLIRIndexingMapOpInterface.so.22.0git  lib/libMLIRMaskableOpInterface.so.22.0git  lib/libMLIRMaskingOpInterface.so.22.0git  lib/libMLIRVectorInterfaces.so.22.0git  lib/libMLIRSubsetOpInterface.so.22.0git  lib/libMLIRValueBoundsOpInterface.so.22.0git  lib/libMLIRDestinationStyleOpInterface.so.22.0git  lib/libMLIRRewrite.so.22.0git  lib/libMLIRRewritePDL.so.22.0git  lib/libMLIRPDLToPDLInterp.so.22.0git  lib/libMLIRPass.so.22.0git  lib/libMLIRPDLInterpDialect.so.22.0git  lib/libMLIRPDLDialect.so.22.0git  lib/libMLIRUBDialect.so.22.0git  lib/libMLIRMemorySlotInterfaces.so.22.0git  lib/libMLIRAnalysis.so.22.0git  lib/libMLIRSideEffectInterfaces.so.22.0git  lib/libMLIRInferIntRangeInterface.so.22.0git  lib/libMLIRInferTypeOpInterface.so.22.0git  lib/libMLIRControlFlowInterfaces.so.22.0git  lib/libMLIRViewLikeInterface.so.22.0git  lib/libMLIRLoopLikeInterface.so.22.0git  lib/libMLIRFunctionInterfaces.so.22.0git  lib/libMLIRDataLayoutInterfaces.so.22.0git  lib/libMLIRCallInterfaces.so.22.0git  lib/libMLIRPresburger.so.22.0git  lib/libMLIRRuntimeVerifiableOpInterface.so.22.0git  lib/libMLIRIR.so.22.0git  lib/libMLIRSupport.so.22.0git  lib/libLLVMSupport.so.22.0git  -Wl,-rpath-link,/home/tcwg-buildbot/worker/flang-aarch64-latest-gcc/build/lib && :
/usr/bin/ld: tools/mlir/lib/Dialect/AMDGPU/Transforms/CMakeFiles/obj.MLIRAMDGPUTransforms.dir/FoldMemRefsOps.cpp.o: in function `mlir::amdgpu::FoldMemRefOpsIntoGatherToLDSOp::matchAndRewrite(mlir::amdgpu::GatherToLDSOp, mlir::PatternRewriter&) const':
FoldMemRefsOps.cpp:(.text._ZNK4mlir6amdgpu30FoldMemRefOpsIntoGatherToLDSOp15matchAndRewriteENS0_13GatherToLDSOpERNS_15PatternRewriterE[_ZNK4mlir6amdgpu30FoldMemRefOpsIntoGatherToLDSOp15matchAndRewriteENS0_13GatherToLDSOpERNS_15PatternRewriterE]+0x294): undefined reference to `mlir::affine::resolveIndicesIntoOpWithOffsetsAndStrides(mlir::RewriterBase&, mlir::Location, llvm::ArrayRef<mlir::OpFoldResult>, llvm::ArrayRef<mlir::OpFoldResult>, llvm::SmallBitVector const&, llvm::ArrayRef<mlir::OpFoldResult>, llvm::SmallVectorImpl<mlir::Value>&)'
collect2: error: ld returned 1 exit status
290.783 [1815/23/5732] Building CXX object tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaAPINotes.cpp.o
290.792 [1815/22/5733] Building CXX object tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaARM.cpp.o
290.795 [1815/21/5734] Building CXX object tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaAVR.cpp.o
290.803 [1815/20/5735] Building CXX object tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaAMDGPU.cpp.o
290.972 [1815/19/5736] Building CXX object tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/IdentifierResolver.cpp.o
290.978 [1815/18/5737] Linking CXX shared library lib/libMLIRAMDGPUToROCDL.so.22.0git
291.013 [1815/17/5738] Linking CXX shared library lib/libMLIRArithToAMDGPU.so.22.0git
291.152 [1815/16/5739] Linking CXX shared library lib/libMLIRGPUTransforms.so.22.0git
291.303 [1815/15/5740] Building CXX object tools/clang/lib/CrossTU/CMakeFiles/obj.clangCrossTU.dir/CrossTranslationUnit.cpp.o
291.452 [1815/14/5741] Building CXX object tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/CodeCompleteConsumer.cpp.o
291.582 [1815/13/5742] Building CXX object tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaAttr.cpp.o
291.644 [1815/12/5743] Building CXX object tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/SemaAvailability.cpp.o
291.665 [1815/11/5744] Building CXX object tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/AnalysisBasedWarnings.cpp.o
291.726 [1815/10/5745] Building CXX object tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/Sema.cpp.o
291.839 [1815/9/5746] Building CXX object tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/BackendUtil.cpp.o
292.636 [1815/8/5747] Linking CXX shared library lib/libclangAST.so.22.0git
296.608 [1815/7/5748] Building CXX object tools/mlir/lib/Dialect/Vector/Transforms/CMakeFiles/obj.MLIRVectorTransforms.dir/VectorTransferOpTransforms.cpp.o
297.326 [1815/6/5749] Building CXX object tools/mlir/lib/Dialect/Vector/Transforms/CMakeFiles/obj.MLIRVectorTransforms.dir/VectorEmulateNarrowType.cpp.o
363.018 [1815/5/5750] Building CXX object tools/mlir/tools/mlir-reduce/CMakeFiles/mlir-reduce.dir/mlir-reduce.cpp.o
366.789 [1815/4/5751] Building CXX object tools/mlir/lib/CAPI/RegisterEverything/CMakeFiles/obj.MLIRCAPIRegisterEverything.dir/RegisterEverything.cpp.o
370.678 [1815/3/5752] Building CXX object tools/mlir/tools/mlir-opt/CMakeFiles/MLIRMlirOptMain.dir/mlir-opt.cpp.o
373.073 [1815/2/5753] Building CXX object tools/mlir/examples/transform-opt/CMakeFiles/mlir-transform-opt.dir/mlir-transform-opt.cpp.o
376.542 [1815/1/5754] Building CXX object tools/mlir/tools/mlir-opt/CMakeFiles/mlir-opt.dir/mlir-opt.cpp.o
ninja: build stopped: subcommand failed.

lialan added a commit that referenced this pull request Jul 23, 2025
#150256)

…to `amdgpu.gather_to_lds` (#149851)"

This reverts commit dbc63f1.

Having build deps issue.
rupprecht added a commit to rupprecht/llvm-project that referenced this pull request Jul 23, 2025
rupprecht added a commit to rupprecht/llvm-project that referenced this pull request Jul 23, 2025
@llvm-ci
Copy link
Collaborator

llvm-ci commented Jul 23, 2025

LLVM Buildbot has detected a new failure on builder premerge-monolithic-linux running on premerge-linux-1 while building mlir at step 7 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/153/builds/39096

Here is the relevant piece of the build log for the reference
Step 7 (test-build-unified-tree-check-all) failure: test (failure)
...
PASS: lld :: COFF/delayimports-error.test (98828 of 101836)
PASS: lld :: COFF/ctors_dtors_priority.s (98829 of 101836)
PASS: lld :: COFF/duplicate-absolute.s (98830 of 101836)
PASS: lld :: COFF/duplicate.test (98831 of 101836)
PASS: lld :: COFF/duplicate-cv.s (98832 of 101836)
PASS: lld :: COFF/comdat-selection-associative-largest.s (98833 of 101836)
PASS: lld :: COFF/defparser.test (98834 of 101836)
PASS: lld :: COFF/duplicate-dwarf.s (98835 of 101836)
PASS: lld :: COFF/dtlto/options.test (98836 of 101836)
TIMEOUT: MLIR :: Examples/standalone/test.toy (98837 of 101836)
******************** TEST 'MLIR :: Examples/standalone/test.toy' FAILED ********************
Exit Code: 1
Timeout: Reached timeout of 60 seconds

Command Output (stdout):
--
# RUN: at line 1
"/etc/cmake/bin/cmake" "/build/buildbot/premerge-monolithic-linux/llvm-project/mlir/examples/standalone" -G "Ninja"  -DCMAKE_CXX_COMPILER=/usr/bin/clang++ -DCMAKE_C_COMPILER=/usr/bin/clang  -DLLVM_ENABLE_LIBCXX=OFF -DMLIR_DIR=/build/buildbot/premerge-monolithic-linux/build/lib/cmake/mlir  -DLLVM_USE_LINKER=lld  -DPython3_EXECUTABLE="/usr/bin/python3.10"
# executed command: /etc/cmake/bin/cmake /build/buildbot/premerge-monolithic-linux/llvm-project/mlir/examples/standalone -G Ninja -DCMAKE_CXX_COMPILER=/usr/bin/clang++ -DCMAKE_C_COMPILER=/usr/bin/clang -DLLVM_ENABLE_LIBCXX=OFF -DMLIR_DIR=/build/buildbot/premerge-monolithic-linux/build/lib/cmake/mlir -DLLVM_USE_LINKER=lld -DPython3_EXECUTABLE=/usr/bin/python3.10
# .---command stdout------------
# | -- The CXX compiler identification is Clang 16.0.6
# | -- The C compiler identification is Clang 16.0.6
# | -- Detecting CXX compiler ABI info
# | -- Detecting CXX compiler ABI info - done
# | -- Check for working CXX compiler: /usr/bin/clang++ - skipped
# | -- Detecting CXX compile features
# | -- Detecting CXX compile features - done
# | -- Detecting C compiler ABI info
# | -- Detecting C compiler ABI info - done
# | -- Check for working C compiler: /usr/bin/clang - skipped
# | -- Detecting C compile features
# | -- Detecting C compile features - done
# | -- Looking for histedit.h
# | -- Looking for histedit.h - found
# | -- Found LibEdit: /usr/include (found version "2.11") 
# | -- Found ZLIB: /usr/lib/x86_64-linux-gnu/libz.so (found version "1.2.11") 
# | -- Found LibXml2: /usr/lib/x86_64-linux-gnu/libxml2.so (found version "2.9.13") 
# | -- Using MLIRConfig.cmake in: /build/buildbot/premerge-monolithic-linux/build/lib/cmake/mlir
# | -- Using LLVMConfig.cmake in: /build/buildbot/premerge-monolithic-linux/build/lib/cmake/llvm
# | -- Linker detection: unknown
# | -- Performing Test LLVM_LIBSTDCXX_MIN
# | -- Performing Test LLVM_LIBSTDCXX_MIN - Success
# | -- Performing Test LLVM_LIBSTDCXX_SOFT_ERROR
# | -- Performing Test LLVM_LIBSTDCXX_SOFT_ERROR - Success
# | -- Performing Test CXX_SUPPORTS_CUSTOM_LINKER
# | -- Performing Test CXX_SUPPORTS_CUSTOM_LINKER - Success
# | -- Performing Test C_SUPPORTS_FPIC
# | -- Performing Test C_SUPPORTS_FPIC - Success
# | -- Performing Test CXX_SUPPORTS_FPIC

lialan added a commit that referenced this pull request Jul 24, 2025
…nto `amdgpu.gather_to_lds`" (#150334)

This is a reapply of patch #149851. The reapply also fixes a CMake/Bazel
build issue, which was the reason of the revert. (Thanks @rupprecht )

Original patch (#149851) message:
-----
This PR adds a new optimization pass to fold
`memref.subview/expand_shape/collapse_shape` ops into consumer
`amdgpu.gather_to_lds` operations.
* Implements a new pass `AmdgpuFoldMemRefOpsPass` with pattern
`FoldMemRefOpsIntoGatherToLDSOp`
* Adds corresponding folding tests
mahesh-attarde pushed a commit to mahesh-attarde/llvm-project that referenced this pull request Jul 28, 2025
…pu.gather_to_lds` (llvm#149851)

This PR adds a new optimization pass to fold
`memref.subview/expand_shape/collapse_shape` ops into consumer
`amdgpu.gather_to_lds` operations.

* Implements a new pass `AmdgpuFoldMemRefOpsPass` with pattern
`FoldMemRefOpsIntoGatherToLDSOp`
* Adds corresponding folding tests

---------

Co-authored-by: Copilot <[email protected]>
mahesh-attarde pushed a commit to mahesh-attarde/llvm-project that referenced this pull request Jul 28, 2025
llvm#150256)

…to `amdgpu.gather_to_lds` (llvm#149851)"

This reverts commit dbc63f1.

Having build deps issue.
mahesh-attarde pushed a commit to mahesh-attarde/llvm-project that referenced this pull request Jul 28, 2025
…nto `amdgpu.gather_to_lds`" (llvm#150334)

This is a reapply of patch llvm#149851. The reapply also fixes a CMake/Bazel
build issue, which was the reason of the revert. (Thanks @rupprecht )

Original patch (llvm#149851) message:
-----
This PR adds a new optimization pass to fold
`memref.subview/expand_shape/collapse_shape` ops into consumer
`amdgpu.gather_to_lds` operations.
* Implements a new pass `AmdgpuFoldMemRefOpsPass` with pattern
`FoldMemRefOpsIntoGatherToLDSOp`
* Adds corresponding folding tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants